ProbMods — ppl-gym

probmods2-agents-as-programs / ex1.a

answer dist/bool solver accept pyro pass 0.0000

00 statement source: exercises/04.1-agents-as-programs.md

given

A single fair coin (heads probability 0.5 a priori). A soft-conditioning factor is applied: when the coin lands heads, a log-weight of 3 is added; when it lands tails, no weight is added.

model

A Bernoulli trial with prior probability 0.5 for heads. The outcome is soft-conditioned by adding log-weight 3 for heads and 0 for tails, then the result is enumerated exactly.

query

The posterior distribution over whether the coin lands heads.

answer spec dist/bool

{
  "kind": "dist",
  "domain": "bool"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (Infer({method: 'enumerate'}, function () {
2  var A = flip();
3  factor(A*3);
4  return A;
5}));

◆realization0.000

python

1# A ~ flip(); factor(A*3); posterior over A. Exact discrete enumeration via Pyro.
2
3@pyro.infer.config_enumerate
4def model():
5    A = pyro.sample("A", dist.Bernoulli(0.5))
6    pyro.factor("f", A * 3.0)
7    return A
8
9marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
10m = marg["A"]
11sup = m.enumerate_support()
12probs = m.log_prob(sup).exp()
13
14ANSWER = {}
15for s, p in zip(sup.tolist(), probs.tolist()):
16    ANSWER[bool(int(s))] = p
17

02answer overlay — webppl vs pyrodist/bool

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-agents-as-programs / ex1.b

answer dist/bool solver accept pyro pass 0.0000

00 statement source: exercises/04.1-agents-as-programs.md

given

Three independent fair coins, each with heads probability 0.5. A soft-conditioning factor adds log-weight 1 to outcomes where exactly 2 of the 3 coins land heads, and 0 otherwise.

model

Three independent Bernoulli trials, each with prior probability 0.5 for heads. The joint outcome is soft-weighted by adding log-weight 1 when the total number of heads equals exactly 2. The marginal distribution over the first coin is computed by exact enumeration.

query

The posterior marginal distribution over whether the first coin lands heads.

answer spec dist/bool

{
  "kind": "dist",
  "domain": "bool"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (Infer({}, function() {
2    var a = flip(0.5);
3    var b = flip(0.5);
4    var c = flip(0.5);
5    factor(1*((a+b+c)==2));
6    return a;
7}));

◆realization0.000

python

1@pyro.infer.config_enumerate
2def model():
3    a = pyro.sample("a", dist.Bernoulli(0.5))
4    b = pyro.sample("b", dist.Bernoulli(0.5))
5    c = pyro.sample("c", dist.Bernoulli(0.5))
6    total = a + b + c
7    lw = torch.where(total == 2, torch.tensor(1.0), torch.tensor(0.0))
8    pyro.factor("two_heads", lw)
9    return a
10
11marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
12_m = marg["a"]
13_p_true = _m.log_prob(torch.tensor(1.0)).exp().item()
14ANSWER = {False: 1.0 - _p_true, True: _p_true}
15

02answer overlay — webppl vs pyrodist/bool

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-agents-as-programs / ex2.a

answer dist/int solver accept pyro pass 0.0000

00 statement source: exercises/04.1-agents-as-programs.md

given

A proposer splits $10 with a responder in $1 increments; possible offers are integers 0 through 10. The responder accepts any offer of $1 or more and rejects an offer of $0. If the offer is accepted, the proposer receives $10 minus the offer; if rejected, the proposer receives $0.

model

The proposer's offer is drawn uniformly over {0, 1, ..., 10}. The responder's accept/reject decision is deterministic: accept iff offer > 0. The proposer's reward is soft-maximized by using the reward as the factor weight. Exact enumeration over all offers.

query

The soft-maximizing distribution over the proposer's offer.

answer spec dist/int

{
  "kind": "dist",
  "domain": "int"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var responder = function(offer) {    
2    return (offer>0 ? true : false);
3};
4var ANSWER = (Infer({method: "enumerate"}, function(){
5    var offer = uniformDraw([0,1,2,3,4,5,6,7,8,9,10]);
6    var reward = responder(offer) ? (10 - offer) : 0;
7    factor(reward);
8    return offer;
9}));

◆realization0.000

python

1# probmods2-agents-as-programs/ex2.a
2# Proposer offer ~ Uniform{0..10}; responder accepts iff offer > 0.
3# Reward (10-offer if accepted else 0) is used as the factor weight, so the
4# proposer's offer is soft-maximized. Exact enumeration over the 11 offers.
5
6offers = torch.arange(0, 11)  # 0..10, index == offer value
7
8@pyro.infer.config_enumerate
9def model():
10    offer = pyro.sample("offer", dist.Categorical(torch.ones(11) / 11.0))
11    accepted = offer > 0
12    reward = torch.where(accepted, (10 - offer).double(), torch.tensor(0.0).double())
13    pyro.factor("reward", reward)
14
15marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
16offer_marg = marg["offer"]
17probs = offer_marg.probs.detach()
18
19ANSWER = {int(offers[i].item()): float(probs[i].item()) for i in range(11)}
20

02answer overlay — webppl vs pyrodist/int

webppl pyro11 bins · 0 … 10

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-agents-as-programs / ex2.b

answer dist/int solver accept pyro pass 0.0000

00 statement source: exercises/04.1-agents-as-programs.md

given

A proposer splits $10 with a responder in $1 increments; possible offers are integers 0 through 10. The responder accepts the offer with probability (offer/10)^2 (i.e., the fraction of $10 given to the responder, squared). If accepted, the proposer receives $10 minus the offer; if rejected, the proposer receives $0.

model

The proposer's offer is drawn uniformly over {0, 1, ..., 10}. The proposer's reward is soft-maximized by placing a factor equal to the realized reward inside a joint enumeration over all (offer, accept/reject) outcome pairs, where the realized reward is (10 − offer) if the responder accepts and 0 if rejected. Exact enumeration over all offers and both accept/reject outcomes.

query

The soft-maximizing distribution over the proposer's offer.

answer spec dist/int

{
  "kind": "dist",
  "domain": "int"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var alpha = 2;
2
3var responder = function(offer, alpha) {    
4    var p = Math.pow(offer/10,alpha);
5    return flip(p);
6};
7var ANSWER = (Infer({method: "enumerate"}, function(){
8    var offer = uniformDraw([0,1,2,3,4,5,6,7,8,9,10]);
9    var reward = responder(offer,alpha) ? (10 - offer) : 0;
10    factor(reward);
11    return offer;
12}));

◆realization0.000

python

1# Soft-maximizing proposer: offer ~ uniformDraw(0..10), responder accepts with
2# p = (offer/10)^alpha, reward = (10-offer) if accept else 0, factor(reward).
3# Marginal posterior over offer. The accept latent is discrete and enumerable;
4# run exact Pyro enumeration over both offer and accept.
5
6alpha = 2.0
7offers = list(range(11))
8n_off = len(offers)
9accept_p = torch.tensor([ (o / 10.0) ** alpha for o in offers ])  # per-offer accept prob
10
11@pyro.infer.config_enumerate
12def model():
13    offer = pyro.sample("offer", dist.Categorical(torch.ones(n_off) / n_off))
14    p = accept_p[offer]
15    accept = pyro.sample("accept", dist.Bernoulli(p))
16    reward = torch.where(accept.bool(),
17                         (10.0 - offer.double()),
18                         torch.tensor(0.0))
19    pyro.factor("f", reward)
20    return offer
21
22marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
23m = marg["offer"]
24sup = m.enumerate_support()
25probs = m.log_prob(sup).exp()
26
27ANSWER = {}
28for s, p in zip(sup.tolist(), probs.tolist()):
29    ANSWER[int(s)] = p
30

02answer overlay — webppl vs pyrodist/int

webppl pyro11 bins · 0 … 10

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-agents-as-programs / ex2.d

answer dist/real solver accept pyro pass 0.0851

00 statement source: exercises/04.1-agents-as-programs.md

given

A responder accepts an offer o with probability (o/10)^alpha, where alpha controls sensitivity to the offer. The proposer's prior over alpha is uniform on the interval [0.5, 5]. In a single round, the proposer offered $2 and the responder rejected (i.e., the accepted payoff was 0).

model

The proposer holds a continuous prior over the responder's sensitivity parameter alpha. Given the observed rejection of a $2 offer, the posterior over alpha is updated via Bayesian conditioning.

query

The posterior distribution over alpha given the rejection, obtained via MCMC with 50000 samples.

answer spec dist/real

{
  "kind": "dist",
  "domain": "real"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var responder = function(offer, alpha) {    
2    var p = Math.pow(offer/10,alpha);
3    return flip(p);
4};
5var ANSWER = (Infer({method: "MCMC", samples:50000}, function(){
6    var alpha = uniform(0.5,5);
7    var offer = 2;
8    var reward = responder(offer, alpha) ? (10 - offer) : 0;
9    condition(reward==0);
10    return alpha;
11}));
12

◆realization0.085

python

1def model():
2    alpha = pyro.sample("alpha", dist.Uniform(0.5, 5.0))
3    # responder accepts offer 2 with prob (2/10)^alpha; rejection observed (reward==0)
4    p_accept = torch.pow(torch.tensor(2.0 / 10.0), alpha)
5    log_p_reject = torch.log1p(-p_accept)
6    pyro.factor("reject", log_p_reject)
7    return alpha
8
9nuts = pyro.infer.NUTS(model)
10mcmc = pyro.infer.MCMC(nuts, num_samples=1000, warmup_steps=400)
11mcmc.run()
12ANSWER = mcmc.get_samples()["alpha"]
13

02answer overlay — webppl vs pyrodist/real

webppl pyro24363 bins · 0.50 … 5.00

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0220 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.011, 0.011] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0851 ≤ tol 0.3747 · floors 0.1873/0.0220

★ feedback on this problem

probmods2-agents-as-programs / ex2.e

answer dist/int solver accept pyro pass 0.1200

00 statement source: exercises/04.1-agents-as-programs.md

given

Ultimatum game: a responder accepts offer o with probability (o/10)^alpha, where alpha controls sensitivity. The proposer's prior over alpha is uniform on [0.5, 5]. Offers are integer dollar amounts from 0 to 10 inclusive. The proposer's payoff is (10 - offer) if the offer is accepted, and 0 if rejected. Round 1: the proposer offered $2 and the responder rejected.

model

After observing the round-1 rejection, the proposer updates beliefs about alpha. In round 2 the proposer is a rational agent: given a draw of alpha from the updated posterior, the proposer samples an offer, simulates the responder once (accepting with probability (offer/10)^alpha), and adds the realized payoff — (10 − offer) if accepted, 0 if rejected — to the log-weight (a softmax agent over realized outcomes, not expected-payoff weighting).

query

The marginal distribution over the proposer's round-2 offer (an integer dollar amount from 0 to 10), under the two-stage model described above.

answer spec dist/int

{
  "kind": "dist",
  "domain": "int"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var responder = function(offer, alpha) {    
2    var p = Math.pow(offer/10,alpha);
3    return flip(p);
4};
5
6var proposer1 = Infer({method: "MCMC", samples:50000}, function(){
7    var alpha = uniform(0.5,5);
8    var offer1 = 2;
9    var reward1 = responder(offer1, alpha) ? (10 - offer1) : 0;
10    condition(reward1==0);
11    return alpha;
12});
13var ANSWER = (Infer({method: "forward", samples:1000}, function(){
14     var alpha2 = sample(proposer1);
15     var proposer2 = Infer({method: "MCMC", samples:5000}, function(){
16       var offer2 = uniformDraw([0,1,2,3,4,5,6,7,8,9,10]);
17       var reward2 = responder(offer2, alpha2) ? (10 - offer2) : 0;
18       factor(reward2);
19       return offer2;
20      });
21      return sample(proposer2);
22}));
23

◆realization0.120

python

1# Two-stage agent model.
2# Stage 1: alpha posterior given round-1 rejection of offer=2, via MCMC (NUTS).
3#   The discrete responder accept/reject for offer1 is enumerated away with
4#   config_enumerate so NUTS samples only the continuous alpha.
5# Stage 2: for each of 1000 outer alpha draws, build the round-2 proposer
6#   distribution by EXACT Pyro enumeration over the finite offer support (the
7#   accept latent enumerated), draw one offer from it, aggregate.
8
9responder_p = lambda offer, alpha: (offer / 10.0) ** alpha
10
11# ----- Stage 1: alpha | round-1 rejection, via MCMC -----
12@pyro.infer.config_enumerate
13def proposer1_model():
14    alpha = pyro.sample("alpha", dist.Uniform(0.5, 5.0))
15    offer1 = 2.0
16    p1 = (offer1 / 10.0) ** alpha
17    # reward1 = (10-offer1) if accept else 0; condition reward1 == 0 i.e. reject.
18    accept1 = pyro.sample("accept1", dist.Bernoulli(p1))
19    reward1 = torch.where(accept1.bool(), torch.tensor(8.0), torch.tensor(0.0))
20    # condition(reward1 == 0)  <=>  reject  <=>  accept1 == 0
21    pyro.factor("rej", torch.where(reward1 == 0.0, torch.tensor(0.0),
22                                   torch.tensor(float("-inf"))))
23    return alpha
24
25mcmc = pyro.infer.MCMC(pyro.infer.NUTS(proposer1_model),
26                       num_samples=2000, warmup_steps=500)
27mcmc.run()
28alpha_post = mcmc.get_samples()["alpha"]  # 1-D tensor of alpha draws
29
30offers = list(range(11))
31n_off = len(offers)
32offers_t = torch.tensor([float(o) for o in offers])
33
34
35def proposer2_probs(alpha2):
36    # exact enumeration of the round-2 proposer for a fixed alpha2.
37    ap = torch.tensor([ (o / 10.0) ** alpha2 for o in offers ])
38
39    @pyro.infer.config_enumerate
40    def model():
41        offer2 = pyro.sample("offer2", dist.Categorical(torch.ones(n_off) / n_off))
42        p = ap[offer2]
43        accept = pyro.sample("accept2", dist.Bernoulli(p))
44        reward2 = torch.where(accept.bool(), (10.0 - offer2.double()),
45                              torch.tensor(0.0))
46        pyro.factor("f", reward2)
47        return offer2
48    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
49    m = marg["offer2"]
50    sup = m.enumerate_support()
51    probs = m.log_prob(sup).exp()
52    full = torch.zeros(n_off)
53    for s, p in zip(sup.tolist(), probs.tolist()):
54        full[int(s)] = p
55    return full
56
57
58# ----- Stage 2: outer forward sampling over alpha posterior -----
59counts = Counter()
60n_outer = 1000
61idx = torch.randint(0, alpha_post.shape[0], (n_outer,))
62for i in range(n_outer):
63    alpha2 = float(alpha_post[int(idx[i])].item())
64    probs = proposer2_probs(alpha2)
65    drawn = int(pyro.sample(f"draw_{i}", dist.Categorical(probs)).item())
66    counts[drawn] += 1
67
68ANSWER = {o: counts[o] / n_outer for o in offers if counts[o] > 0}
69

02answer overlay — webppl vs pyrodist/int

webppl pyro11 bins · 0 … 10

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.2360 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.116, 0.116] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.1200 ≤ tol 0.4720 · floors 0.1480/0.2360

★ feedback on this problem

probmods2-agents-as-programs / ex3

answer dist/bool solver accept pyro pass 0.0000

00 statement source: exercises/04.1-agents-as-programs.md

given

Prisoner's dilemma: if the focal thief confesses (regardless of what the other does), she receives a lenient sentence of 6 years. If she does not confess but the other does, she receives 10 years. If neither confesses, she goes free (0 years). The other thief independently decides to confess with probability 0.5. The soft-conditioning weight for each joint outcome is (10 - years_in_jail) / 10.

model

Both thieves independently and uniformly decide whether to confess. The focal thief's years in jail follow the payoff matrix above. Each joint outcome's unnormalized log-weight is increased by (10 − years) / 10, so unnormalized weights are proportional to exp((10 − years) / 10) — not a multiplicative weight.

query

The posterior distribution over whether the focal thief confesses, under the soft-conditioning scheme described.

answer spec dist/bool

{
  "kind": "dist",
  "domain": "bool"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var thiefRats = function(){
2  return flip();
3};
4
5var lenient = 6;
6var ANSWER = (Infer({}, function(){
7  var otherThiefRats = thiefRats();
8  var IRat = thiefRats();
9  var years = (otherThiefRats? 
10              (IRat? lenient : 10) : 
11              (IRat? lenient : 0));
12  var percentYearsFreedom = (10-years)/10;
13  factor(percentYearsFreedom);
14  return IRat;
15}));
16

◆realization0.000

python

1lenient = 6
2
3@pyro.infer.config_enumerate
4def model():
5    other_rats = pyro.sample("other", dist.Bernoulli(0.5))
6    i_rats = pyro.sample("i_rats", dist.Bernoulli(0.5))
7    # years: if other confesses -> (I confess? 6 : 10); else -> (I confess? 6 : 0)
8    years = torch.where(
9        other_rats.bool(),
10        torch.where(i_rats.bool(), torch.tensor(float(lenient)), torch.tensor(10.0)),
11        torch.where(i_rats.bool(), torch.tensor(float(lenient)), torch.tensor(0.0)),
12    )
13    percent_freedom = (10.0 - years) / 10.0
14    pyro.factor("soft", percent_freedom)
15    return i_rats
16
17marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
18m = marg["i_rats"]
19sup = m.enumerate_support()
20probs = m.log_prob(sup).exp()
21ANSWER = {bool(int(sup[i].item())): float(probs[i]) for i in range(len(sup))}
22

02answer overlay — webppl vs pyrodist/bool

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-agents-as-programs / ex4.a

answer record(alpha_001, alpha_1, alpha_4, alpha_10) solver accept pyro pass 0.0000

00 statement source: exercises/04.1-agents-as-programs.md

given

Three objects in the world, each described by a {shape, color} pair: {square, blue}, {circle, blue}, {square, green}, drawn with equal probability. Four possible utterances: 'blue', 'green', 'square', 'circle'. Truth function: an utterance about color is true iff it matches the object's color; an utterance about shape is true iff it matches the object's shape; all other utterances are vacuously true.

model

RSA (Rational Speech Acts) model with three levels. The literal listener infers the object by combining the uniform prior with the truth function. The speaker chooses utterances with probability proportional to exp(alpha * log P(object | utterance)) under the literal listener, where alpha is a rationality parameter. The pragmatic listener infers the object from the speaker's distribution, combining the uniform prior with the speaker's probability of the utterance.

query

The pragmatic listener's posterior distribution over objects given the utterance 'blue', computed for rationality parameters alpha = 0.01, 1, 4, and 10. Return a record with fields alpha_001, alpha_1, alpha_4, and alpha_10.

answer spec record(alpha_001, alpha_1, alpha_4, alpha_10)

{
  "kind": "record",
  "fields": {
    "alpha_001": {
      "kind": "dist",
      "domain": "finite",
      "support": [
        {
          "shape": "square",
          "color": "blue"
        },
        {
          "shape": "circle",
          "color": "blue"
        },
        {
          "shape": "square",
          "color": "green"
        }
      ]
    },
    "alpha_1": {
      "kind": "dist",
      "domain": "finite",
      "support": [
        {
          "shape": "square",
          "color": "blue"
        },
        {
          "shape": "circle",
          "color": "blue"
        },
        {
          "shape": "square",
          "color": "green"
        }
      ]
    },
    "alpha_4": {
      "kind": "dist",
      "domain": "finite",
      "support": [
        {
          "shape": "square",
          "color": "blue"
        },
        {
          "shape": "circle",
          "color": "blue"
        },
        {
          "shape": "square",
          "color": "green"
        }
      ]
    },
    "alpha_10": {
      "kind": "dist",
      "domain": "finite",
      "support": [
        {
          "shape": "square",
          "color": "blue"
        },
        {
          "shape": "circle",
          "color": "blue"
        },
        {
          "shape": "square",
          "color": "green"
        }
      ]
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var meaningPrior = function() {
2  uniformDraw([
3    {shape: "square", color: "blue"},
4    {shape: "circle", color: "blue"},
5    {shape: "square", color: "green"}
6  ])
7};
8
9var utterances = ["blue","green","square","circle"];
10
11var meaning = function(utterance, obj){
12  (utterance === "blue" || utterance === "green") ? utterance === obj.color :
13  (utterance === "circle" || utterance === "square") ? utterance === obj.shape :
14  true
15};
16
17var literalListener = function(utterance){
18  return Infer({model: function(){
19    var obj = meaningPrior();
20    condition(meaning(utterance, obj));
21    return obj;
22  }});
23};
24
25var speaker = function(obj,alpha){
26  return Infer({model: function(){
27    var utterance = uniformDraw(utterances);
28    factor(alpha * literalListener(utterance).score(obj));
29    return utterance;
30  }});
31};
32
33var pragmaticListener = function(utterance,alpha){
34  return Infer({model: function(){
35    var obj = meaningPrior();
36    observe(speaker(obj,alpha),utterance);
37    return obj;
38  }});
39};
40var ANSWER = (({
41  alpha_001: pragmaticListener("blue", 0.01),
42  alpha_1: pragmaticListener("blue", 1),
43  alpha_4: pragmaticListener("blue", 4),
44  alpha_10: pragmaticListener("blue", 10)
45}));
46

◆realization0.000

python

1# RSA (Rational Speech Acts), three levels, faithful to the WebPPL reference.
2# Each level is genuine Pyro enumeration inference over a single-sample model;
3# inner-level distributions feed the outer level's pyro.factor via their log-prob.
4
5objects = [
6    {"shape": "square", "color": "blue"},
7    {"shape": "circle", "color": "blue"},
8    {"shape": "square", "color": "green"},
9]
10utterances = ["blue", "green", "square", "circle"]
11
12
13def meaning(utterance, obj):
14    if utterance == "blue" or utterance == "green":
15        return utterance == obj["color"]
16    if utterance == "circle" or utterance == "square":
17        return utterance == obj["shape"]
18    return True
19
20
21def enum_dist(model, values):
22    # Run exact enumeration over the single latent site 'x' (an index into
23    # `values`) and return {value_index: probability} from the marginal.
24    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(
25        model, lambda: None
26    )["x"]
27    sup = marg.enumerate_support()
28    probs = marg.log_prob(sup).exp()
29    out = {}
30    for s, p in zip(sup.tolist(), probs.tolist()):
31        out[int(s)] = p
32    return out
33
34
35def literal_listener(utterance):
36    # uniform prior over objects, condition on the utterance being true
37    @pyro.infer.config_enumerate
38    def model():
39        x = pyro.sample(
40            "x", dist.Categorical(torch.ones(len(objects)))
41        )
42        truths = torch.tensor(
43            [1.0 if meaning(utterance, o) else 0.0 for o in objects]
44        )
45        ev = torch.log(truths)[x]
46        pyro.factor("ev", ev)
47        return x
48
49    return enum_dist(model, objects)
50
51
52def speaker(obj_idx, alpha):
53    # cache literal-listener log-scores of obj_idx under each utterance
54    ll_scores = []
55    for u in utterances:
56        d = literal_listener(u)
57        p = d.get(obj_idx, 0.0)
58        ll_scores.append(math.log(p) if p > 0 else float("-inf"))
59    ll_scores = torch.tensor(ll_scores)
60
61    @pyro.infer.config_enumerate
62    def model():
63        x = pyro.sample("x", dist.Categorical(torch.ones(len(utterances))))
64        pyro.factor("ev", alpha * ll_scores[x])
65        return x
66
67    return enum_dist(model, utterances)
68
69
70def pragmatic_listener(utterance, alpha):
71    # uniform prior over objects, observe the speaker uttering `utterance`
72    u_idx = utterances.index(utterance)
73    # speaker(obj, alpha) log-prob of the heard utterance, per object
74    sp_scores = []
75    for i in range(len(objects)):
76        d = speaker(i, alpha)
77        p = d.get(u_idx, 0.0)
78        sp_scores.append(math.log(p) if p > 0 else float("-inf"))
79    sp_scores = torch.tensor(sp_scores)
80
81    @pyro.infer.config_enumerate
82    def model():
83        x = pyro.sample("x", dist.Categorical(torch.ones(len(objects))))
84        pyro.factor("ev", sp_scores[x])
85        return x
86
87    d = enum_dist(model, objects)
88    return {i: d.get(i, 0.0) for i in range(len(objects))}
89
90
91def as_record_dist(idx_dist):
92    out = {}
93    for i, o in enumerate(objects):
94        out[json.dumps(o, sort_keys=True)] = idx_dist.get(i, 0.0)
95    return out
96
97
98ANSWER = {
99    "alpha_001": as_record_dist(pragmatic_listener("blue", 0.01)),
100    "alpha_1": as_record_dist(pragmatic_listener("blue", 1)),
101    "alpha_4": as_record_dist(pragmatic_listener("blue", 4)),
102    "alpha_10": as_record_dist(pragmatic_listener("blue", 10)),
103}
104

02answer overlay — webppl vs pyrorecord(alpha_001, alpha_1, alpha_4, alpha_10)

alpha_001

webppl pyro4 bins

alpha_1

webppl pyro4 bins

alpha_4

webppl pyro4 bins

alpha_10

webppl pyro4 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (record)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-agents-as-programs / ex4.b

answer record(L1, L2) solver accept pyro pass 0.0000

00 statement source: exercises/04.1-agents-as-programs.md

given

Same three-object world as the standard RSA setup: objects {square, blue}, {circle, blue}, {square, green} drawn uniformly; utterances {blue, green, square, circle}; same truth function (color/shape match, vacuously true otherwise). Rationality parameter alpha = 1.

model

Two-level RSA stack built on top of the literal listener. The level-1 listener infers the object by combining the prior with a level-1 speaker; the level-1 speaker weights utterances by exp(alpha * log P(object | utterance)) under the literal listener. The level-2 listener infers the object from a level-2 speaker; the level-2 speaker weights utterances by exp(alpha * log P(object | utterance)) under the level-1 listener.

query

The posterior distributions over objects given the utterance 'blue' for the level-1 and level-2 listeners. Return a record with fields L1 and L2.

answer spec record(L1, L2)

{
  "kind": "record",
  "fields": {
    "L1": {
      "kind": "dist",
      "domain": "finite",
      "labels": {
        "record": {
          "shape": "string",
          "color": "string"
        }
      }
    },
    "L2": {
      "kind": "dist",
      "domain": "finite",
      "labels": {
        "record": {
          "shape": "string",
          "color": "string"
        }
      }
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var meaningPrior = function() {
2  uniformDraw([
3    {shape: "square", color: "blue"},
4    {shape: "circle", color: "blue"},
5    {shape: "square", color: "green"}
6  ])
7};
8
9var utterances = ["blue","green","square","circle"];
10
11var meaning = function(utterance, obj){
12  (utterance === "blue" || utterance === "green") ? utterance === obj.color :
13  (utterance === "circle" || utterance === "square") ? utterance === obj.shape :
14  true
15};
16
17var alpha = 1;
18
19var literalListener = function(utterance){
20  return Infer({model: function(){
21    var obj = meaningPrior();
22    condition(meaning(utterance, obj));
23    return obj;
24  }});
25};
26
27var speaker = function(obj){
28  return Infer({model: function(){
29    var utterance = uniformDraw(utterances);
30    factor(alpha * literalListener(utterance).score(obj));
31    return utterance;
32  }});
33};
34
35var pragmaticListener = function(utterance){
36  return Infer({model: function(){
37    var obj = meaningPrior();
38    observe(speaker(obj),utterance);
39    return obj;
40  }});
41};
42
43var speaker2 = function(obj){
44  return Infer({model: function(){
45    var utterance = uniformDraw(utterances);
46    factor(alpha * pragmaticListener(utterance).score(obj));
47    return utterance;
48  }});
49};
50
51var listener3 = function(utterance){
52  return Infer({model: function(){
53    var obj = meaningPrior();
54    observe(speaker2(obj),utterance);
55    return obj;
56  }});
57};
58var ANSWER = (({
59  L1: pragmaticListener("blue"),
60  L2: listener3("blue")
61}));
62

◆realization0.000

python

1# RSA scalar-implicature model. Every level (literal listener, speaker, pragmatic
2# listener, speaker2, listener3) is produced by exact Pyro enumeration
3# (config_enumerate + TraceEnum_ELBO.compute_marginals). The answer is the
4# pragmatic-listener (L1) and the level-3 listener (L2) posteriors over objects
5# given utterance 'blue'.
6
7objects = [
8    {"shape": "square", "color": "blue"},
9    {"shape": "circle", "color": "blue"},
10    {"shape": "square", "color": "green"},
11]
12n_obj = len(objects)
13utterances = ["blue", "green", "square", "circle"]
14n_utt = len(utterances)
15alpha = 1.0
16
17
18def meaning(utterance, obj):
19    if utterance in ("blue", "green"):
20        return utterance == obj["color"]
21    if utterance in ("circle", "square"):
22        return utterance == obj["shape"]
23    return True
24
25
26def literal_listener_logprobs(utterance):
27    # uniform prior over objects, condition on meaning(utterance, obj). Exact
28    # enumeration over the object latent.
29    holds = torch.tensor([1.0 if meaning(utterance, o) else 0.0 for o in objects])
30
31    @pyro.infer.config_enumerate
32    def model():
33        obj = pyro.sample("obj", dist.Categorical(torch.ones(n_obj) / n_obj))
34        logp = torch.where(holds[obj] > 0.0, torch.tensor(0.0),
35                           torch.tensor(float("-inf")))
36        pyro.factor("meaning", logp)
37        return obj
38    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
39    m = marg["obj"]
40    sup = m.enumerate_support()
41    logps = m.log_prob(sup)
42    out = torch.full((n_obj,), float("-inf"))
43    for s, lp in zip(sup.tolist(), logps.tolist()):
44        out[int(s)] = lp
45    return out
46
47
48# precompute literal listener scores: ll_score[utterance][obj]
49_ll = [literal_listener_logprobs(u) for u in utterances]
50
51
52def speaker_logprobs(obj_idx):
53    # enumerate utterance ~ uniform, factor(alpha * literalListener(utt).score(obj))
54    scores = torch.tensor([alpha * _ll[u][obj_idx].item() for u in range(n_utt)])
55
56    @pyro.infer.config_enumerate
57    def model():
58        utt = pyro.sample("utt", dist.Categorical(torch.ones(n_utt) / n_utt))
59        pyro.factor("f", scores[utt])
60        return utt
61    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
62    m = marg["utt"]
63    sup = m.enumerate_support()
64    logps = m.log_prob(sup)
65    out = torch.full((n_utt,), float("-inf"))
66    for s, lp in zip(sup.tolist(), logps.tolist()):
67        out[int(s)] = lp
68    return out
69
70
71_speaker = [speaker_logprobs(o) for o in range(n_obj)]
72
73
74def pragmatic_listener_probs(utterance):
75    # obj ~ uniform; observe(speaker(obj), utterance). Enumerate obj.
76    u_idx = utterances.index(utterance)
77    obs_scores = torch.tensor([_speaker[o][u_idx].item() for o in range(n_obj)])
78
79    @pyro.infer.config_enumerate
80    def model():
81        obj = pyro.sample("obj", dist.Categorical(torch.ones(n_obj) / n_obj))
82        pyro.factor("obs", obs_scores[obj])
83        return obj
84    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
85    m = marg["obj"]
86    sup = m.enumerate_support()
87    probs = m.log_prob(sup).exp()
88    out = torch.zeros(n_obj)
89    for s, p in zip(sup.tolist(), probs.tolist()):
90        out[int(s)] = p
91    return out
92
93
94def pragmatic_listener_logprobs(utterance):
95    p = pragmatic_listener_probs(utterance)
96    return torch.log(p.clamp_min(1e-300))
97
98
99# precompute pragmatic listener scores for speaker2: pl_score[utterance][obj]
100_pl = [pragmatic_listener_logprobs(u) for u in utterances]
101
102
103def speaker2_logprobs(obj_idx):
104    scores = torch.tensor([alpha * _pl[u][obj_idx].item() for u in range(n_utt)])
105
106    @pyro.infer.config_enumerate
107    def model():
108        utt = pyro.sample("utt", dist.Categorical(torch.ones(n_utt) / n_utt))
109        pyro.factor("f", scores[utt])
110        return utt
111    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
112    m = marg["utt"]
113    sup = m.enumerate_support()
114    logps = m.log_prob(sup)
115    out = torch.full((n_utt,), float("-inf"))
116    for s, lp in zip(sup.tolist(), logps.tolist()):
117        out[int(s)] = lp
118    return out
119
120
121_speaker2 = [speaker2_logprobs(o) for o in range(n_obj)]
122
123
124def listener3_probs(utterance):
125    u_idx = utterances.index(utterance)
126    obs_scores = torch.tensor([_speaker2[o][u_idx].item() for o in range(n_obj)])
127
128    @pyro.infer.config_enumerate
129    def model():
130        obj = pyro.sample("obj", dist.Categorical(torch.ones(n_obj) / n_obj))
131        pyro.factor("obs", obs_scores[obj])
132        return obj
133    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
134    m = marg["obj"]
135    sup = m.enumerate_support()
136    probs = m.log_prob(sup).exp()
137    out = torch.zeros(n_obj)
138    for s, p in zip(sup.tolist(), probs.tolist()):
139        out[int(s)] = p
140    return out
141
142
143def to_dist(probs):
144    # key each outcome by its named-field record (sorted keys: color, shape),
145    # serialized as compact JSON to match the harness's label space.
146    d = {}
147    for i, o in enumerate(objects):
148        key = '{"color": "%s", "shape": "%s"}' % (o["color"], o["shape"])
149        d[key] = float(probs[i].item())
150    return d
151
152
153L1 = to_dist(pragmatic_listener_probs("blue"))
154L2 = to_dist(listener3_probs("blue"))
155ANSWER = {"L1": L1, "L2": L2}
156

02answer overlay — webppl vs pyrorecord(L1, L2)

L1

webppl pyro2 bins

L2

webppl pyro3 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (record)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-bayesian-data-analysis / ex1.2

answer dist/int solver accept pyro pass 0.0220

00 statement source: exercises/bayesian-data-analysis.md

given

Observed data: k=1 success in n=20 Bernoulli trials. Prior on the success probability p: Beta(a=1, b=1). A new experiment has new_n=5 trials.

model

The success probability p is drawn from the prior. The observed count k is generated from Binomial(p, n). A posterior-predictive count is the number of successes in a fresh Binomial(p, new_n) draw using the same p.

query

The marginal posterior distribution over the posterior-predictive count (an integer 0 through 5).

answer spec dist/int

{
  "kind": "dist",
  "domain": "int"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var k = 1;
2var n = 20;
3var new_n = 5;
4var priorDist = Beta({a: 1, b: 1});
5
6var model = function() {
7   var p = sample(priorDist);
8   observe(Binomial({p : p, n: n}), k);
9   var posteriorPredictive = binomial(p, new_n);
10   var prior_p = sample(priorDist);
11   var priorPredictive = binomial(prior_p, n);
12   return {
13       prior: prior_p, priorPredictive : priorPredictive,
14       posterior : p, posteriorPredictive : posteriorPredictive
15   };
16};
17var joint = Infer({method: "MCMC", samples: 2500, lag: 50}, model);
18var ANSWER = marginalize(joint, function(x) { return x.posteriorPredictive; });

◆realization0.022

python

1k = 1
2n = 20
3new_n = 5
4prior_dist = dist.Beta(torch.tensor(1.0), torch.tensor(1.0))
5
6def model():
7    p = pyro.sample("p", prior_dist)
8    pyro.sample("obs", dist.Binomial(total_count=n, probs=p), obs=torch.tensor(float(k)))
9
10nuts = pyro.infer.NUTS(model)
11mcmc = pyro.infer.MCMC(nuts, num_samples=2500, warmup_steps=1000)
12mcmc.run()
13_p_samples = mcmc.get_samples()["p"]
14
15# Posterior predictive: a fresh Binomial(p, new_n) draw per posterior p sample.
16_pp = dist.Binomial(total_count=new_n, probs=_p_samples).sample()
17ANSWER = [int(x) for x in _pp.tolist()]
18

02answer overlay — webppl vs pyrodist/int

webppl pyro6 bins · 0 … 5

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0364 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.026, 0.026] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0220 ≤ tol 0.0896 · floors 0.0448/0.0364

★ feedback on this problem

probmods2-conditional-dependence / ex1.a

answer record(prior, death, deathAndCold, deathAndNoCold) solver accept pyro pass 0.0000

00 statement source: exercises/conditional-dependence.md

given

Cancer occurs with probability 0.00001. Given cancer, death from cancer occurs with probability 0.9. The common cold occurs with probability 0.2; given a cold, death from the cold occurs with probability 0.00006. Death from other causes (independent of cancer and cold) occurs with probability 0.000000001. A person dies if they die from cancer, from the cold, or from other causes.

model

Cancer, cold, and other-cause death are drawn independently from their priors. Death from cancer requires having cancer; death from the cold requires having a cold. The person dies if any cause of death occurs.

query

A record of four posterior distributions over whether the person has cancer: prior (unconditional); death (given the person died); deathAndCold (given the person died and had a cold); deathAndNoCold (given the person died and did not have a cold).

answer spec record(prior, death, deathAndCold, deathAndNoCold)

{
  "kind": "record",
  "fields": {
    "prior": {
      "kind": "dist",
      "domain": "bool"
    },
    "death": {
      "kind": "dist",
      "domain": "bool"
    },
    "deathAndCold": {
      "kind": "dist",
      "domain": "bool"
    },
    "deathAndNoCold": {
      "kind": "dist",
      "domain": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (({
2  prior: Infer({method: 'enumerate'}, function() {
3var cancer = flip(0.00001);
4var cold = flip(0.2);
5var death_by_cancer = cancer && flip(0.9);
6var death_by_cold = cold && flip(0.00006);
7var other_death = flip(0.000000001);
8var death = death_by_cancer || death_by_cold || other_death;
9    return cancer;
10  }),
11  death: Infer({method: 'enumerate'}, function() {
12var cancer = flip(0.00001);
13var cold = flip(0.2);
14var death_by_cancer = cancer && flip(0.9);
15var death_by_cold = cold && flip(0.00006);
16var other_death = flip(0.000000001);
17var death = death_by_cancer || death_by_cold || other_death;
18    condition(death);
19    return cancer;
20  }),
21  deathAndCold: Infer({method: 'enumerate'}, function() {
22var cancer = flip(0.00001);
23var cold = flip(0.2);
24var death_by_cancer = cancer && flip(0.9);
25var death_by_cold = cold && flip(0.00006);
26var other_death = flip(0.000000001);
27var death = death_by_cancer || death_by_cold || other_death;
28    condition(death && cold);
29    return cancer;
30  }),
31  deathAndNoCold: Infer({method: 'enumerate'}, function() {
32var cancer = flip(0.00001);
33var cold = flip(0.2);
34var death_by_cancer = cancer && flip(0.9);
35var death_by_cold = cold && flip(0.00006);
36var other_death = flip(0.000000001);
37var death = death_by_cancer || death_by_cold || other_death;
38    condition(death && !cold);
39    return cancer;
40  })
41}));
42

◆realization0.000

python

1def make_model(cond):
2    @pyro.infer.config_enumerate
3    def model():
4        cancer = pyro.sample('cancer', dist.Bernoulli(0.00001)).bool()
5        cold = pyro.sample('cold', dist.Bernoulli(0.2)).bool()
6        dbc_coin = pyro.sample('dbc', dist.Bernoulli(0.9)).bool()
7        dbcold_coin = pyro.sample('dbcold', dist.Bernoulli(0.00006)).bool()
8        other = pyro.sample('other', dist.Bernoulli(0.000000001)).bool()
9        death_by_cancer = cancer & dbc_coin
10        death_by_cold = cold & dbcold_coin
11        death = death_by_cancer | death_by_cold | other
12        ev = cond(death, cold)
13        pyro.factor('ev', torch.where(ev, torch.tensor(0.0), torch.tensor(float('-inf'))))
14        return cancer
15    return model
16
17
18def marginal_bool(cond):
19    model = make_model(cond)
20    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
21    m = marg['cancer']
22    sup = m.enumerate_support()
23    probs = m.log_prob(sup).exp()
24    out = {}
25    for v, p in zip(sup.tolist(), probs.tolist()):
26        out[bool(v)] = out.get(bool(v), 0.0) + p
27    return out
28
29
30ANSWER = {
31    'prior': marginal_bool(lambda death, cold: torch.tensor(True)),
32    'death': marginal_bool(lambda death, cold: death),
33    'deathAndCold': marginal_bool(lambda death, cold: death & cold),
34    'deathAndNoCold': marginal_bool(lambda death, cold: death & (~cold)),
35}
36

02answer overlay — webppl vs pyrorecord(prior, death, deathAndCold, deathAndNoCold)

prior

webppl pyro2 bins

death

webppl pyro2 bins

deathAndCold

webppl pyro2 bins

deathAndNoCold

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (record)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditional-dependence / ex1.b

answer record(prior, death, deathAndCancer, deathAndNoCancer) solver accept pyro pass 0.0000

00 statement source: exercises/conditional-dependence.md

given

Cancer occurs with probability 0.00001. Given cancer, death from cancer occurs with probability 0.9. The common cold occurs with probability 0.2; given a cold, death from the cold occurs with probability 0.00006. Death from other causes (independent of cancer and cold) occurs with probability 0.000000001. A person dies if they die from cancer, from the cold, or from other causes.

model

Cancer, cold, and other-cause death are drawn independently from their priors. Death from cancer requires having cancer; death from the cold requires having a cold. The person dies if any cause of death occurs.

query

A record of four posterior distributions over whether the person has a cold: prior (unconditional); death (given the person died); deathAndCancer (given the person died and had cancer); deathAndNoCancer (given the person died and did not have cancer).

answer spec record(prior, death, deathAndCancer, deathAndNoCancer)

{
  "kind": "record",
  "fields": {
    "prior": {
      "kind": "dist",
      "domain": "bool"
    },
    "death": {
      "kind": "dist",
      "domain": "bool"
    },
    "deathAndCancer": {
      "kind": "dist",
      "domain": "bool"
    },
    "deathAndNoCancer": {
      "kind": "dist",
      "domain": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (({
2  prior: Infer({method: 'enumerate'}, function() {
3var cancer = flip(0.00001);
4var cold = flip(0.2);
5var death_by_cancer = cancer && flip(0.9);
6var death_by_cold = cold && flip(0.00006);
7var other_death = flip(0.000000001);
8var death = death_by_cancer || death_by_cold || other_death;
9    return cold;
10  }),
11  death: Infer({method: 'enumerate'}, function() {
12var cancer = flip(0.00001);
13var cold = flip(0.2);
14var death_by_cancer = cancer && flip(0.9);
15var death_by_cold = cold && flip(0.00006);
16var other_death = flip(0.000000001);
17var death = death_by_cancer || death_by_cold || other_death;
18    condition(death);
19    return cold;
20  }),
21  deathAndCancer: Infer({method: 'enumerate'}, function() {
22var cancer = flip(0.00001);
23var cold = flip(0.2);
24var death_by_cancer = cancer && flip(0.9);
25var death_by_cold = cold && flip(0.00006);
26var other_death = flip(0.000000001);
27var death = death_by_cancer || death_by_cold || other_death;
28    condition(death && cancer);
29    return cold;
30  }),
31  deathAndNoCancer: Infer({method: 'enumerate'}, function() {
32var cancer = flip(0.00001);
33var cold = flip(0.2);
34var death_by_cancer = cancer && flip(0.9);
35var death_by_cold = cold && flip(0.00006);
36var other_death = flip(0.000000001);
37var death = death_by_cancer || death_by_cold || other_death;
38    condition(death && !cancer);
39    return cold;
40  })
41}));
42

◆realization0.000

python

1# Cold / cancer / death model. Four posteriors over `cold` under different
2# conditioning, each produced by exact Pyro discrete enumeration
3# (config_enumerate + TraceEnum_ELBO.compute_marginals).
4
5def cold_posterior(condition_fn):
6    @pyro.infer.config_enumerate
7    def model():
8        cancer = pyro.sample("cancer", dist.Bernoulli(0.00001))
9        cold = pyro.sample("cold", dist.Bernoulli(0.2))
10        dbc_flip = pyro.sample("dbc", dist.Bernoulli(0.9))
11        dbcold_flip = pyro.sample("dbcold", dist.Bernoulli(0.00006))
12        other = pyro.sample("other", dist.Bernoulli(0.000000001))
13        cancer_b = cancer.bool()
14        cold_b = cold.bool()
15        death_by_cancer = cancer_b & dbc_flip.bool()
16        death_by_cold = cold_b & dbcold_flip.bool()
17        other_death = other.bool()
18        death = death_by_cancer | death_by_cold | other_death
19        ev = condition_fn(death, cancer_b)
20        if ev is not None:
21            pyro.factor("cond", torch.where(ev, torch.tensor(0.0),
22                                            torch.tensor(float("-inf"))))
23        return cold
24    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
25    m = marg["cold"]
26    sup = m.enumerate_support()
27    probs = m.log_prob(sup).exp()
28    out = {}
29    for s, p in zip(sup.tolist(), probs.tolist()):
30        out[bool(int(s))] = p
31    return out
32
33
34prior = cold_posterior(lambda death, cancer: None)
35death = cold_posterior(lambda death, cancer: death)
36deathAndCancer = cold_posterior(lambda death, cancer: death & cancer)
37deathAndNoCancer = cold_posterior(lambda death, cancer: death & ~cancer)
38
39ANSWER = {
40    "prior": prior,
41    "death": death,
42    "deathAndCancer": deathAndCancer,
43    "deathAndNoCancer": deathAndNoCancer,
44}
45

02answer overlay — webppl vs pyrorecord(prior, death, deathAndCancer, deathAndNoCancer)

prior

webppl pyro2 bins

death

webppl pyro2 bins

deathAndCancer

webppl pyro2 bins

deathAndNoCancer

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (record)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex1.a

answer value/real solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

A fair coin has probability 0.5 of landing heads.

model

A single fair coin is flipped once.

query

The probability that the coin lands heads.

answer spec value/real

{
  "kind": "value",
  "domain": "real"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var model = function() { return flip() ? "H" : "T" };
2var ANSWER = (Math.exp(Infer({method:'enumerate'}, model).score('H')));

◆realization0.000

python

1@pyro.infer.config_enumerate
2def model():
3    h = pyro.sample('h', dist.Bernoulli(0.5))
4    return h
5
6
7marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
8m = marg['h']
9sup = m.enumerate_support()
10probs = m.log_prob(sup).exp()
11p_heads = 0.0
12for v, p in zip(sup.tolist(), probs.tolist()):
13    if bool(v):
14        p_heads += p
15ANSWER = p_heads
16

02answervalue/real

webppl

0.5000

pyro

0.5000

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (absdiff)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex1.b

answer dist/bool solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

There are two coins: one fair (P(heads) = 0.5) and one biased (P(heads) = 0.9). One coin is selected uniformly at random. The selected coin is flipped three times independently. The first two flips both landed heads.

model

A coin type (fair or biased) is drawn with equal probability. Each flip of the selected coin is independent with heads probability determined by the coin type. The first two flips are observed to be heads.

query

The posterior distribution over whether the third flip lands heads (true) or tails (false).

answer spec dist/bool

{
  "kind": "dist",
  "domain": "bool"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var flipCoin = function(coinType) {
2  return coinType == "fair" ? flip() : flip(0.9);
3}
4var model = function() {
5  var coinType = flip() ? "fair" : "biased";
6  var flip1 = flipCoin(coinType);
7  var flip2 = flipCoin(coinType);
8  var flip3 = flipCoin(coinType);
9  condition(flip1 && flip2);
10  return flip3;
11};
12var ANSWER = (Infer({method:'enumerate'}, model));
13

◆realization0.000

python

1@pyro.infer.config_enumerate
2def model():
3    fair = pyro.sample("fair", dist.Bernoulli(0.5)).bool()
4    p = torch.where(fair, torch.tensor(0.5), torch.tensor(0.9))
5    flip1 = pyro.sample("flip1", dist.Bernoulli(p)).bool()
6    flip2 = pyro.sample("flip2", dist.Bernoulli(p)).bool()
7    pyro.sample("flip3", dist.Bernoulli(p))
8    ev = flip1 & flip2
9    pyro.factor("cond", torch.where(ev, torch.tensor(0.0), torch.tensor(float("-inf"))))
10
11marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
12m = marg["flip3"]
13sup = m.enumerate_support()
14probs = m.log_prob(sup).exp()
15ANSWER = {bool(s.item()): float(p.item()) for s, p in zip(sup, probs)}
16

02answer overlay — webppl vs pyrodist/bool

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex1.c

answer dist/finite solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

There are two coins: one fair (P(heads) = 0.5) and one biased (P(heads) = 0.9). One coin is selected uniformly at random. The selected coin is flipped three times independently. All three flips landed heads.

model

A coin type (fair or biased) is drawn with equal probability. Each flip of the selected coin is independent with heads probability determined by the coin type. All three flips are observed to be heads.

query

The posterior distribution over the coin type — the string 'fair' or the string 'biased'.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "fair",
    "biased"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var flipCoin = function(coinType) {
2  return coinType == "fair" ? flip() : flip(0.9);
3}
4var model = function() {
5  var coinType = flip() ? "fair" : "biased";
6  var flip1 = flipCoin(coinType);
7  var flip2 = flipCoin(coinType);
8  var flip3 = flipCoin(coinType);
9  condition(flip1 && flip2 && flip3);
10  return coinType;
11};
12var ANSWER = (Infer({method:'enumerate'}, model));
13

◆realization0.000

python

1# Two coins (fair p=0.5, biased p=0.9), one chosen uniformly, three flips all
2# heads. Posterior over coin type by exact enumeration.
3
4coin_types = ["fair", "biased"]
5heads_p = {"fair": 0.5, "biased": 0.9}
6
7
8@pyro.infer.config_enumerate
9def model():
10    t = pyro.sample("coinType", dist.Categorical(torch.tensor([0.5, 0.5])))
11    p = torch.tensor([heads_p["fair"], heads_p["biased"]])[t]
12    pyro.sample("flip1", dist.Bernoulli(p), obs=torch.tensor(1.0))
13    pyro.sample("flip2", dist.Bernoulli(p), obs=torch.tensor(1.0))
14    pyro.sample("flip3", dist.Bernoulli(p), obs=torch.tensor(1.0))
15    return t
16
17
18marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(
19    model, lambda: None
20)
21type_marg = marg["coinType"]
22ANSWER = {
23    coin_types[i]: torch.exp(type_marg.log_prob(torch.tensor(i))).item()
24    for i in range(len(coin_types))
25}
26

02answer overlay — webppl vs pyrodist/finite

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex1.d

answer dist/bool solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

There are two coins: one fair (P(heads) = 0.5) and one biased (P(heads) = 0.9). One coin is selected uniformly at random. The selected coin is flipped three times independently. The first two flips landed on different sides (one heads and one tails).

model

A coin type (fair or biased) is drawn with equal probability. Each flip of the selected coin is independent with heads probability determined by the coin type. The first two flips are observed to have different outcomes.

query

The posterior distribution over whether the third flip lands heads (true) or tails (false).

answer spec dist/bool

{
  "kind": "dist",
  "domain": "bool"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var flipCoin = function(coinType) {
2  return coinType == "fair" ? flip() : flip(0.9);
3}
4var model = function() {
5  var coinType = flip() ? "fair" : "biased";
6  var flip1 = flipCoin(coinType);
7  var flip2 = flipCoin(coinType);
8  var flip3 = flipCoin(coinType);
9  condition(flip1 != flip2);
10  return flip3;
11};
12var ANSWER = (Infer({method:'enumerate'}, model));
13

◆realization0.000

python

1# Two coins (fair p=0.5, biased p=0.9), pick one uniformly, flip 3 times,
2# condition flip1 != flip2, query distribution over flip3. Exact enumeration.
3
4
5@pyro.infer.config_enumerate
6def model():
7    # coinType: 0 = fair (p=0.5), 1 = biased (p=0.9)
8    coin = pyro.sample("coin", dist.Categorical(torch.tensor([0.5, 0.5])))
9    p = torch.where(coin == 0, torch.tensor(0.5), torch.tensor(0.9))
10    f1 = pyro.sample("f1", dist.Bernoulli(p))
11    f2 = pyro.sample("f2", dist.Bernoulli(p))
12    f3 = pyro.sample("f3", dist.Bernoulli(p))
13    diff = f1 != f2
14    pyro.factor(
15        "ev",
16        torch.where(diff, torch.tensor(0.0), torch.tensor(float("-inf"))),
17    )
18    return f3
19
20
21marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(
22    model, lambda: None
23)["f3"]
24sup = marg.enumerate_support()
25probs = marg.log_prob(sup).exp()
26ANSWER = {}
27for s, p in zip(sup.tolist(), probs.tolist()):
28    ANSWER[bool(s)] = p
29

02answer overlay — webppl vs pyrodist/bool

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex2.a

answer record(original, intervention, conditioning) solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

Lung cancer is present with prior probability 0.01. A cold is present with prior probability 0.2. A cough occurs if: a cold is present and a cough-given-cold flip comes up (probability 0.5), or lung cancer is present and a cough-given-cancer flip comes up (probability 0.3). These two pathways are combined as a logical OR.

model

Lung cancer and cold are independent binary causes of coughing. Each cause contributes independently to producing a cough via its own noisy channel, and a cough results if either channel fires.

query

Return a record with three fields, each a distribution over whether a cough occurs: (1) `original` — the unconditional marginal of cough; (2) `intervention` — the marginal of cough after setting lung cancer to true (an intervention, not conditioning); (3) `conditioning` — the marginal of cough after observing that lung cancer is true (an observation, updating beliefs).

answer spec record(original, intervention, conditioning)

{
  "kind": "record",
  "fields": {
    "original": {
      "kind": "dist",
      "domain": "bool"
    },
    "intervention": {
      "kind": "dist",
      "domain": "bool"
    },
    "conditioning": {
      "kind": "dist",
      "domain": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (({
2  original: Infer({method: "enumerate"}, function() {
3    var lungCancer = flip(0.01);
4    var cold = flip(0.2);
5    var cough = (cold && flip(0.5)) || (lungCancer && flip(0.3));
6    return cough;
7  }),
8  intervention: Infer({method: "enumerate"}, function() {
9    var lungCancer = true;
10    var cold = flip(0.2);
11    var cough = (cold && flip(0.5)) || (lungCancer && flip(0.3));
12    return cough;
13  }),
14  conditioning: Infer({method: "enumerate"}, function() {
15    var lungCancer = flip(0.01);
16    condition(lungCancer);
17    var cold = flip(0.2);
18    var cough = (cold && flip(0.5)) || (lungCancer && flip(0.3));
19    return cough;
20  })
21}));

◆realization0.000

python

1# probmods2-conditioning/ex2.a
2# Three marginals over `cough`:
3#   original     : unconditional
4#   intervention : lungCancer set to true (do-operation; lungCancer not a random choice)
5#   conditioning : lungCancer observed true (updates beliefs)
6# Exact discrete enumeration via config_enumerate + compute_marginals. `cough` is
7# made a genuine discrete sample site (a degenerate Bernoulli on its deterministic
8# value) so compute_marginals returns a marginal for it.
9
10ZERO = torch.tensor(0.0)
11NEG_INF = torch.tensor(float("-inf"))
12
13def original_model():
14    lungCancer = pyro.sample("lungCancer", dist.Bernoulli(0.01)).bool()
15    cold = pyro.sample("cold", dist.Bernoulli(0.2)).bool()
16    c1 = pyro.sample("c1", dist.Bernoulli(0.5)).bool()
17    c2 = pyro.sample("c2", dist.Bernoulli(0.3)).bool()
18    cough = (cold & c1) | (lungCancer & c2)
19    pyro.sample("cough", dist.Bernoulli(cough.double()))
20    return cough
21
22def intervention_model():
23    # intervention: lungCancer is fixed to true, not a random choice
24    lungCancer = torch.tensor(True)
25    cold = pyro.sample("cold", dist.Bernoulli(0.2)).bool()
26    c1 = pyro.sample("c1", dist.Bernoulli(0.5)).bool()
27    c2 = pyro.sample("c2", dist.Bernoulli(0.3)).bool()
28    cough = (cold & c1) | (lungCancer & c2)
29    pyro.sample("cough", dist.Bernoulli(cough.double()))
30    return cough
31
32def conditioning_model():
33    lungCancer = pyro.sample("lungCancer", dist.Bernoulli(0.01)).bool()
34    pyro.factor("obs_lc", torch.where(lungCancer, ZERO, NEG_INF))
35    cold = pyro.sample("cold", dist.Bernoulli(0.2)).bool()
36    c1 = pyro.sample("c1", dist.Bernoulli(0.5)).bool()
37    c2 = pyro.sample("c2", dist.Bernoulli(0.3)).bool()
38    cough = (cold & c1) | (lungCancer & c2)
39    pyro.sample("cough", dist.Bernoulli(cough.double()))
40    return cough
41
42def marginal_of(model_fn):
43    enum = pyro.infer.config_enumerate(model_fn)
44    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(enum, lambda: None)
45    m = marg["cough"]
46    sup = m.enumerate_support()
47    probs = m.log_prob(sup).exp()
48    return {bool(s.item()): float(p.item()) for s, p in zip(sup, probs)}
49
50ANSWER = {
51    "original": marginal_of(original_model),
52    "intervention": marginal_of(intervention_model),
53    "conditioning": marginal_of(conditioning_model),
54}
55

02answer overlay — webppl vs pyrorecord(original, intervention, conditioning)

original

webppl pyro2 bins

intervention

webppl pyro2 bins

conditioning

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (record)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex2.b

answer record(original, intervention, conditioning) solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

A person has lung cancer with probability 0.01, independently has a cold with probability 0.2. Given lung cancer, the person coughs with probability 0.3; given a cold, the person coughs with probability 0.5. Both causes contribute to coughing independently: the person coughs if either causal pathway fires. Coughing is observed to be true.

model

Lung cancer and cold are independent latent causes. Coughing occurs if at least one of the following independent events occurs: the lung-cancer pathway fires (probability 0.3 given lung cancer) or the cold pathway fires (probability 0.5 given a cold). We compare three scenarios for the same underlying system: (1) no observations, (2) coughing is directly forced to be true regardless of its causes (do-operator intervention — the causal parents are unaffected), (3) coughing is observed to be true (conditioning, which propagates information back to the causes).

query

A record with three fields: 'original' — the prior marginal distribution over lung cancer; 'intervention' — the marginal distribution over lung cancer when coughing is forced to true without updating the causal parents; 'conditioning' — the posterior marginal distribution over lung cancer given coughing is observed to be true.

answer spec record(original, intervention, conditioning)

{
  "kind": "record",
  "fields": {
    "original": {
      "kind": "dist",
      "domain": "bool"
    },
    "intervention": {
      "kind": "dist",
      "domain": "bool"
    },
    "conditioning": {
      "kind": "dist",
      "domain": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (({
2  original: Infer({method: "enumerate"}, function() {
3    var lungCancer = flip(0.01);
4    var cold = flip(0.2);
5    var cough = (cold && flip(0.5)) || (lungCancer && flip(0.3));
6    return lungCancer;
7  }),
8  intervention: Infer({method: "enumerate"}, function() {
9    var lungCancer = flip(0.01);
10    var cold = flip(0.2);
11    var cough = true;
12    return lungCancer;
13  }),
14  conditioning: Infer({method: "enumerate"}, function() {
15    var lungCancer = flip(0.01);
16    var cold = flip(0.2);
17    var cough = (cold && flip(0.5)) || (lungCancer && flip(0.3));
18    condition(cough);
19    return lungCancer;
20  })
21}));
22

◆realization0.000

python

1@pyro.infer.config_enumerate
2def original_model():
3    lungCancer = pyro.sample("lungCancer", dist.Bernoulli(0.01))
4    cold = pyro.sample("cold", dist.Bernoulli(0.2))
5    return lungCancer
6
7@pyro.infer.config_enumerate
8def intervention_model():
9    lungCancer = pyro.sample("lungCancer", dist.Bernoulli(0.01))
10    cold = pyro.sample("cold", dist.Bernoulli(0.2))
11    # cough forced to true without informing the causal parents (do-operator)
12    return lungCancer
13
14@pyro.infer.config_enumerate
15def conditioning_model():
16    lungCancer = pyro.sample("lungCancer", dist.Bernoulli(0.01))
17    cold = pyro.sample("cold", dist.Bernoulli(0.2))
18    cold_fires = pyro.sample("cold_fires", dist.Bernoulli(0.5))
19    lung_fires = pyro.sample("lung_fires", dist.Bernoulli(0.3))
20    cough = ((cold > 0) & (cold_fires > 0)) | ((lungCancer > 0) & (lung_fires > 0))
21    pyro.factor("cough_obs", torch.where(cough, torch.tensor(0.0), torch.tensor(float("-inf"))))
22    return lungCancer
23
24def _marg_bool(m, site):
25    p_true = m[site].log_prob(torch.tensor(1.0)).exp().item()
26    return {False: 1.0 - p_true, True: p_true}
27
28_orig = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(original_model, lambda: None)
29_intv = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(intervention_model, lambda: None)
30_cond = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(conditioning_model, lambda: None)
31
32ANSWER = {
33    "original": _marg_bool(_orig, "lungCancer"),
34    "intervention": _marg_bool(_intv, "lungCancer"),
35    "conditioning": _marg_bool(_cond, "lungCancer"),
36}
37

02answer overlay — webppl vs pyrorecord(original, intervention, conditioning)

original

webppl pyro2 bins

intervention

webppl pyro2 bins

conditioning

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (record)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex4.b

answer dist/bool solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

A person is nice with probability 0.7 (a stable, person-specific trait). A nice person wants something from you with probability 0.2; a non-nice person wants something from you with probability 0.5 (varies per occasion). A person smiles if either of two independent channels fires: (a) if they want something, they smile with probability 0.8; otherwise with probability 0.5; (b) if they are nice, they smile with probability 0.8; otherwise with probability 0.5. A smile occurs if at least one of these two independent channels produces a smile (logical OR).

model

Niceness is a latent stable trait. On each occasion, a person independently may or may not want something, depending on their niceness. Whether they smile is determined by the OR of two independent smile-generating channels, one driven by wanting and one by niceness.

query

The marginal distribution over whether Alice smiles on a given occasion, with no observations.

answer spec dist/bool

{
  "kind": "dist",
  "domain": "bool"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var extendedSmilesModel = function() {
2  var nice = mem(function(person) { flip(.7) });
3  var wantsSomething = function(person) {
4    return flip(nice(person) ? .2 : .5);
5  }
6  var smiles = function(person, wants) {
7    return (wants ? flip(.8) : flip(.5))
8            || (nice(person) ? flip(.8) : flip(.5));
9  }
10  var wants = wantsSomething('alice');
11  return smiles('alice', wants);
12};
13var ANSWER = (Infer({method: "enumerate"}, extendedSmilesModel));

◆realization0.000

python

1# probmods2-conditioning/ex4.b
2# Niceness is a stable trait; wanting depends on niceness; smiling is the OR of
3# two independent channels (wanting-driven and niceness-driven).
4# Marginal over whether Alice smiles, no observations. Exact enumeration.
5
6@pyro.infer.config_enumerate
7def model():
8    nice = pyro.sample("nice", dist.Bernoulli(0.7))
9    p_wants = torch.where(nice == 1.0, torch.tensor(0.2), torch.tensor(0.5))
10    wants = pyro.sample("wants", dist.Bernoulli(p_wants))
11    p_chan_want = torch.where(wants == 1.0, torch.tensor(0.8), torch.tensor(0.5))
12    p_chan_nice = torch.where(nice == 1.0, torch.tensor(0.8), torch.tensor(0.5))
13    chan_want = pyro.sample("chan_want", dist.Bernoulli(p_chan_want))
14    chan_nice = pyro.sample("chan_nice", dist.Bernoulli(p_chan_nice))
15    smiles = (chan_want == 1.0) | (chan_nice == 1.0)
16    probs = torch.stack([(~smiles).double(), smiles.double()], dim=-1)
17    pyro.sample("smiles", dist.Categorical(probs))
18
19marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
20p = marg["smiles"].probs.detach()
21ANSWER = {False: float(p[0].item()), True: float(p[1].item())}
22

02answer overlay — webppl vs pyrodist/bool

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex4.c

answer dist/bool solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

A person's niceness is a stable trait: P(nice) = 0.7. Whether the person wants something from you on a given day is independent across days: P(wants | nice) = 0.2, P(wants | not nice) = 0.5. Given whether the person wants something and whether they are nice, they smile with probability determined as follows: both the 'wanting' channel and the 'niceness' channel independently produce a smile (the person smiles if either channel fires). The wanting channel fires with probability 0.8 if they want something, 0.5 otherwise. The niceness channel fires with probability 0.8 if they are nice, 0.5 otherwise. You have observed the person on five previous days, and on each of those days the person was not smiling; each day's wanting was independently drawn from the prior. Today you observe the person smiling; today's wanting is independently drawn from the prior.

model

Niceness is a fixed latent trait drawn once from the prior. Each day's wanting is drawn independently from the conditional prior given niceness. Smiling on a day is the logical OR of the two independent channels (wanting-based and niceness-based). The five past non-smiling observations and today's smiling observation are all conditioned on.

query

The posterior distribution over whether the person wants something from you today (true or false).

answer spec dist/bool

{
  "kind": "dist",
  "domain": "bool"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var extendedSmilesModel = function() {
2  var nice = mem(function(person) { flip(.7) });
3  var wantsSomething = function(person) {
4    return flip(nice(person) ? .2 : .5);
5  }
6  var smiles = function(person, wants) {
7    return (wants ? flip(.8) : flip(.5))
8            || (nice(person) ? flip(.8) : flip(.5));
9  }
10  var wantsToday = wantsSomething('bob');
11  condition(!smiles('bob', wantsSomething('bob')));
12  condition(!smiles('bob', wantsSomething('bob')));
13  condition(!smiles('bob', wantsSomething('bob')));
14  condition(!smiles('bob', wantsSomething('bob')));
15  condition(!smiles('bob', wantsSomething('bob')));
16  condition(smiles('bob', wantsToday));
17  return wantsToday;
18};
19var ANSWER = (Infer({method: "enumerate"}, extendedSmilesModel));
20

◆realization0.000

python

1# probmods2-conditioning/ex4.c
2# `nice('bob')` is memoized in WebPPL (one draw, reused everywhere), so it is a
3# single sample site. `wantsSomething('bob')` is NOT memoized: each call is a
4# fresh draw. Five days of not-smiling (each a fresh wantsSomething draw), then
5# today's smile evaluated at wantsToday. Exact enumeration; `wantsToday` is made
6# a genuine discrete sample site so compute_marginals returns its marginal.
7
8ZERO = torch.tensor(0.0)
9NEG_INF = torch.tensor(float("-inf"))
10
11@pyro.infer.config_enumerate
12def model():
13    nice = pyro.sample("nice", dist.Bernoulli(0.7)).bool()
14
15    def wants_something(name):
16        p = torch.where(nice, torch.tensor(0.2), torch.tensor(0.5))
17        return pyro.sample(name, dist.Bernoulli(p)).bool()
18
19    def smiles(tag, wants):
20        pw = torch.where(wants, torch.tensor(0.8), torch.tensor(0.5))
21        a = pyro.sample(tag + "_a", dist.Bernoulli(pw)).bool()
22        pn = torch.where(nice, torch.tensor(0.8), torch.tensor(0.5))
23        b = pyro.sample(tag + "_b", dist.Bernoulli(pn)).bool()
24        return a | b
25
26    wantsToday = wants_something("wantsToday")
27
28    # five days of NOT smiling, each with a fresh wantsSomething draw
29    for i in range(5):
30        w = wants_something(f"w{i}")
31        s = smiles(f"day{i}", w)
32        pyro.factor(f"obs{i}", torch.where(~s, ZERO, NEG_INF))
33
34    # today: smiles, evaluated at wantsToday
35    s_today = smiles("today", wantsToday)
36    pyro.factor("obs_today", torch.where(s_today, ZERO, NEG_INF))
37
38    pyro.sample("wantsToday_out", dist.Bernoulli(wantsToday.double()))
39
40marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
41m = marg["wantsToday_out"]
42sup = m.enumerate_support()
43probs = m.log_prob(sup).exp()
44ANSWER = {bool(s.item()): float(p.item()) for s, p in zip(sup, probs)}
45

02answer overlay — webppl vs pyrodist/bool

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex5.a

answer record(rain, sprinkler) solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

A sprinkler runs on any given morning with probability 0.5, independently. It rains on any given morning with probability 0.3, independently. The lawn is wet if the sprinkler ran, if it rained, or if both occurred. One morning the lawn is observed to be wet.

model

Rain and sprinkler are independent Bernoulli events. The lawn is wet if and only if at least one of them occurred. The lawn being wet is observed.

query

A record with two fields: 'rain' — the posterior distribution over whether it rained (true/false); 'sprinkler' — the posterior distribution over whether the sprinkler ran (true/false).

answer spec record(rain, sprinkler)

{
  "kind": "record",
  "fields": {
    "rain": {
      "kind": "dist",
      "domain": "bool"
    },
    "sprinkler": {
      "kind": "dist",
      "domain": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (({
2  rain: Infer({method: "enumerate"}, function() {
3    var sprinkler = flip();
4    var rain = flip(0.3);
5    var wetLawn = sprinkler || rain;
6    condition(wetLawn);
7    return rain;
8  }),
9  sprinkler: Infer({method: "enumerate"}, function() {
10    var sprinkler = flip();
11    var rain = flip(0.3);
12    var wetLawn = sprinkler || rain;
13    condition(wetLawn);
14    return sprinkler;
15  })
16}));
17

◆realization0.000

python

1def make_model(ret):
2    @pyro.infer.config_enumerate
3    def model():
4        sprinkler = pyro.sample('sprinkler', dist.Bernoulli(0.5)).bool()
5        rain = pyro.sample('rain', dist.Bernoulli(0.3)).bool()
6        wet = sprinkler | rain
7        pyro.factor('wet', torch.where(wet, torch.tensor(0.0), torch.tensor(float('-inf'))))
8        return ret(rain, sprinkler)
9    return model
10
11
12def marginal_bool(site, ret):
13    model = make_model(ret)
14    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
15    m = marg[site]
16    sup = m.enumerate_support()
17    probs = m.log_prob(sup).exp()
18    out = {}
19    for v, p in zip(sup.tolist(), probs.tolist()):
20        out[bool(v)] = out.get(bool(v), 0.0) + p
21    return out
22
23
24ANSWER = {
25    'rain': marginal_bool('rain', lambda rain, sprinkler: rain),
26    'sprinkler': marginal_bool('sprinkler', lambda rain, sprinkler: sprinkler),
27}
28

02answer overlay — webppl vs pyrorecord(rain, sprinkler)

rain

webppl pyro2 bins

sprinkler

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (record)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex5.b

answer dist/bool solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

Rain falls on a given morning with probability 0.3. Two people (me and Kelsey) each have their own sprinkler; each sprinkler turns on independently with probability 0.5. One morning both lawns are wet.

model

A lawn is wet if rain falls that morning or if that lawn's sprinkler runs; rain affects both lawns simultaneously, while each sprinkler affects only its own lawn.

query

The posterior distribution over whether it rained that morning.

answer spec dist/bool

{
  "kind": "dist",
  "domain": "bool"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (Infer({method: "enumerate"}, function() {
2  var rain = flip(0.3);
3  var mySprinkler = flip();
4  var herSprinkler = flip();
5  var myLawnIsWet = mySprinkler || rain;
6  var herLawnIsWet = herSprinkler || rain;
7  condition(myLawnIsWet && herLawnIsWet);
8  return rain;
9}));
10

◆realization0.000

python

1@pyro.infer.config_enumerate
2def model():
3    rain = pyro.sample("rain", dist.Bernoulli(0.3)).bool()
4    mySprinkler = pyro.sample("mySprinkler", dist.Bernoulli(0.5)).bool()
5    herSprinkler = pyro.sample("herSprinkler", dist.Bernoulli(0.5)).bool()
6    myLawnIsWet = mySprinkler | rain
7    herLawnIsWet = herSprinkler | rain
8    ev = myLawnIsWet & herLawnIsWet
9    pyro.factor("cond", torch.where(ev, torch.tensor(0.0), torch.tensor(float("-inf"))))
10
11marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
12m = marg["rain"]
13sup = m.enumerate_support()
14probs = m.log_prob(sup).exp()
15ANSWER = {bool(s.item()): float(p.item()) for s, p in zip(sup, probs)}
16

02answer overlay — webppl vs pyrodist/bool

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex5.c

answer dist/bool solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

Rain falls on a given morning with probability 0.3. Five people — me, Kelsey, Kevin, Manu, and Josh — each have an independent sprinkler that runs with probability 0.5. One morning all five lawns are wet.

model

A lawn is wet if rain falls or if that lawn's sprinkler runs; rain affects all lawns simultaneously, while each sprinkler affects only its own lawn.

query

The posterior distribution over whether it rained that morning.

answer spec dist/bool

{
  "kind": "dist",
  "domain": "bool"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (Infer({method: "enumerate"}, function() {
2  var rain = flip(0.3);
3  var sprinkler = mem(function(person) { return flip() });
4  var wetLawn = function(person) { return rain || sprinkler(person) };
5  condition(wetLawn("me"));
6  condition(wetLawn("Kelsey"));
7  condition(wetLawn("Kevin"));
8  condition(wetLawn("Manu"));
9  condition(wetLawn("Josh"));
10  return rain;
11}));
12

◆realization0.000

python

1# Rain (p=0.3) plus five independent sprinklers (p=0.5 each); all five lawns
2# wet. A lawn is wet if rain OR its own sprinkler runs. Posterior over rain by
3# exact enumeration.
4
5people = ["me", "Kelsey", "Kevin", "Manu", "Josh"]
6
7
8@pyro.infer.config_enumerate
9def model():
10    rain = pyro.sample("rain", dist.Bernoulli(0.3))
11    sprinklers = [
12        pyro.sample(f"sprinkler_{p}", dist.Bernoulli(0.5)) for p in people
13    ]
14    for p, s in zip(people, sprinklers):
15        wet = ((rain > 0) | (s > 0)).float()
16        logw = torch.where(
17            wet > 0, torch.tensor(0.0), torch.tensor(float("-inf"))
18        )
19        pyro.factor(f"wet_{p}", logw)
20    return rain
21
22
23marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(
24    model, lambda: None
25)
26rain_marg = marg["rain"]
27ANSWER = {
28    True: torch.exp(rain_marg.log_prob(torch.tensor(1.0))).item(),
29    False: torch.exp(rain_marg.log_prob(torch.tensor(0.0))).item(),
30}
31

02answer overlay — webppl vs pyrodist/bool

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex6.c

answer dist/finite solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

A machine draws one letter at random from the word "game": vowels (a, e) are drawn with probability 0.45 each, and consonants (g, m) with probability 0.05 each. Bob's probability of winning given the drawn letter is 1/k^2, where k is that letter's 1-based position in the string "game" (g=1, a=2, m=3, e=4).

model

One letter is sampled from the distribution above. Bob independently wins or loses with probability 1/k^2 based on the letter's position. We observe that Bob won.

query

The posterior distribution over which letter was drawn, given that Bob won.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "g",
    "a",
    "m",
    "e"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var checkVowel = function(letter) { _.includes(['a', 'e', 'i', 'o', 'u'], letter) };
2var letterVals = ['g', 'a', 'm', 'e'];
3var letterProbs = map(function(letter) { checkVowel(letter) ? 0.45 : 0.05 }, letterVals);
4var letters = Categorical({vs: letterVals, ps: letterProbs});
5var ANSWER = (Infer({method: 'enumerate'}, function() {
6  var letter = sample(letters);
7  var position = letterVals.indexOf(letter) + 1;
8  var winProb = 1 / Math.pow(position, 2);
9  condition(flip(winProb));
10  return letter;
11}));

◆realization0.000

python

1# Letter drawn from 'game' (vowels a,e at 0.45; consonants g,m at 0.05),
2# Bob wins with prob 1/k^2 (k = 1-based position).  Condition on Bob won.
3# Query: posterior over the drawn letter.  Exact enumeration.
4
5NEG_INF = float("-inf")
6
7letter_vals = ["g", "a", "m", "e"]
8letter_probs = torch.tensor([0.05, 0.45, 0.05, 0.45], dtype=torch.float64)
9letter_logits = torch.log(letter_probs)
10win_probs = torch.tensor([1.0 / ((i + 1) ** 2) for i in range(len(letter_vals))],
11                         dtype=torch.float64)
12
13@pyro.infer.config_enumerate
14def model():
15    letter = pyro.sample("letter", dist.Categorical(logits=letter_logits))
16    p_win = win_probs[letter]
17    # observe Bob won: flip(winProb) == True
18    pyro.sample("won", dist.Bernoulli(p_win), obs=torch.tensor(1.0, dtype=torch.float64))
19    return letter
20
21marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
22probs = marg["letter"].probs
23ANSWER = {letter_vals[i]: float(probs[i]) for i in range(len(letter_vals))}
24

02answer overlay — webppl vs pyrodist/finite

webppl pyro4 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-conditioning / ex6.d

answer dist/finite solver accept pyro pass 0.0000

00 statement source: exercises/conditioning.md

given

A casino game draws one letter from the ordered set [g, a, m, e] (positions 1 through 4). Consonants g and m each have prior probability 0.05; vowels a and e each have prior probability 0.45. A player at position k wins with probability 1/k². Bob played and won.

model

A letter is drawn according to its prior probability. Given the drawn letter's position k, the player wins with probability 1/k²; whether the player won is observed.

query

The posterior distribution over whether the drawn letter is a vowel or a consonant, given that Bob won.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "vowel",
    "consonant"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var checkVowel = function(letter) { _.includes(['a', 'e', 'i', 'o', 'u'], letter) };
2var letterVals = ['g', 'a', 'm', 'e'];
3var letterProbs = map(function(letter) { checkVowel(letter) ? 0.45 : 0.05 }, letterVals);
4var letters = Categorical({vs: letterVals, ps: letterProbs});
5var ANSWER = (Infer({method: 'enumerate'}, function() {
6  var letter = sample(letters);
7  var position = letterVals.indexOf(letter) + 1;
8  var winProb = 1 / Math.pow(position, 2);
9  condition(flip(winProb));
10  return checkVowel(letter) ? 'vowel' : 'consonant';
11}));
12

◆realization0.000

python

1letter_vals = ["g", "a", "m", "e"]
2vowels = ["a", "e", "i", "o", "u"]
3
4
5def check_vowel(letter):
6    return letter in vowels
7
8
9letter_probs = torch.tensor([0.45 if check_vowel(l) else 0.05 for l in letter_vals])
10
11
12@pyro.infer.config_enumerate
13def model():
14    letter = pyro.sample("letter", dist.Categorical(letter_probs))
15    # position k = index + 1; win probability 1/k^2; condition on a win via log-weight.
16    positions = torch.arange(1, len(letter_vals) + 1).double()
17    win_probs = 1.0 / positions.pow(2)
18    log_win = torch.log(win_probs)
19    pyro.factor("won", log_win[letter])
20    return letter
21
22
23marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
24d = marg["letter"]
25
26p_vowel = 0.0
27p_consonant = 0.0
28for i, l in enumerate(letter_vals):
29    p = float(torch.exp(d.log_prob(torch.tensor(i))))
30    if check_vowel(l):
31        p_vowel += p
32    else:
33        p_consonant += p
34
35ANSWER = {"vowel": p_vowel, "consonant": p_consonant}
36

02answer overlay — webppl vs pyrodist/finite

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-generative-models / ex1.b

answer record(p1, p2, p3) solver accept pyro pass 0.0210

00 statement source: exercises/generative-models.md

given

Three programs each produce a random boolean: Program 1: with probability 0.5 flip a coin; if heads flip again with probability 0.7, if tails flip with probability 0.1. Program 2: flip a coin with probability 0.5; use the result to choose a second flip: probability 0.7 if heads, 0.1 if tails. Program 3: a single flip with probability 0.4.

model

Each program independently generates a boolean by composing fair and biased coin flips in the ways described.

query

Return a record with three fields — one per program — where each field holds a list of 1000 independent draws from that program's marginal distribution.

answer spec record(p1, p2, p3)

{
  "kind": "record",
  "fields": {
    "p1": {
      "kind": "dist",
      "domain": "bool"
    },
    "p2": {
      "kind": "dist",
      "domain": "bool"
    },
    "p3": {
      "kind": "dist",
      "domain": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (({
2  p1: repeat(1000, function() { return flip() ? flip(.7) : flip(.1); }),
3  p2: repeat(1000, function() { return flip(flip() ? .7 : .1); }),
4  p3: repeat(1000, function() { return flip(.4); })
5}));

◆realization0.021

python

1# Program 1: flip a fair coin; if heads flip again at 0.7, else flip at 0.1.
2def program1(i):
3    first = pyro.sample(f"p1_first_{i}", dist.Bernoulli(0.5)) > 0
4    second = pyro.sample(f"p1_second_{i}", dist.Bernoulli(0.7 if first else 0.1)) > 0
5    return bool(second)
6
7# Program 2: flip a fair coin, use it to pick the second flip's bias (0.7 / 0.1).
8def program2(i):
9    first = pyro.sample(f"p2_first_{i}", dist.Bernoulli(0.5)) > 0
10    second = pyro.sample(f"p2_second_{i}", dist.Bernoulli(0.7 if first else 0.1)) > 0
11    return bool(second)
12
13# Program 3: a single flip at 0.4.
14def program3(i):
15    return bool(pyro.sample(f"p3_{i}", dist.Bernoulli(0.4)) > 0)
16
17p1_draws = [program1(i) for i in range(1000)]
18p2_draws = [program2(i) for i in range(1000)]
19p3_draws = [program3(i) for i in range(1000)]
20
21ANSWER = {"p1": p1_draws, "p2": p2_draws, "p3": p3_draws}
22

02answer overlay — webppl vs pyrorecord(p1, p2, p3)

p1

webppl pyro2 bins

p2

webppl pyro2 bins

p3

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0590 (record)
solver re-derivation	accept	2/2 solvers · d=[0.019, 0.019] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0210 ≤ tol 0.1180 · floors 0.0450/0.0590

★ feedback on this problem

probmods2-generative-models / ex1.c

answer dist/bool solver accept pyro pass 0.0050

00 statement source: exercises/generative-models.md

given

A fair coin has probability 0.5 of coming up heads; a biased coin has probability 0.8 of heads. The following three expressions all produce true with probability 0.4: (1) with probability 0.5 return the result of a 0.7-probability flip, otherwise a 0.1-probability flip; (2) flip a fair coin and use probability 0.7 if heads, 0.1 if tails; (3) a single flip with probability 0.4.

model

A boolean is generated by composing one or more coin flips.

query

Write a new expression — structurally different from the three listed — whose marginal probability of returning true is also 0.4. The expression will be evaluated independently for each seed; the marginal is estimated from the collection of results.

answer spec dist/bool

{
  "kind": "dist",
  "domain": "bool",
  "protocol": "draws"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (flip() ? false : flip(.8));

◆realization0.005

python

1# probmods2-generative-models/ex1.c
2# A new expression whose marginal P(true) = 0.4, structurally different from the
3# three listed: with prob 0.5 return False, otherwise return a flip(0.8).
4# P(true) = 0.5 * 0.8 = 0.4. protocol: draws -> one draw, no inference.
5
6gate = pyro.sample("gate", dist.Bernoulli(0.5))
7if gate == 1.0:
8    ANSWER = False
9else:
10    inner = pyro.sample("inner", dist.Bernoulli(0.8))
11    ANSWER = bool(inner.item() == 1.0)
12

02answer overlay — webppl vs pyrodist/bool

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0050 (tv)
solver re-derivation	accept	1/2 solvers · d=[0.220, 0.030] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0050 ≤ tol 0.0300 · floors 0.0150/0.0050

★ feedback on this problem

probmods2-generative-models / ex2.b

answer dist/finite solver accept pyro pass 0.0200

00 statement source: exercises/generative-models.md

given

A single fair coin is flipped once per run.

model

One coin flip is performed, and that flip's boolean result is placed in all three positions of a length-3 list.

query

One draw from the process: the list of three booleans produced by a single run.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "protocol": "draws",
  "support": [
    [
      true,
      true,
      true
    ],
    [
      false,
      false,
      false
    ]
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var foo = mem(function() { return flip(); });
2var ANSWER = ([foo(), foo(), foo()]);
3

◆realization0.020

python

1# One fair coin flip per run; the single flip's value fills all three positions
2# of a length-3 list. protocol 'draws': bind ANSWER to one draw, run no
3# inference -- the harness reseeds and aggregates draws across many runs.
4
5flip = bool(pyro.sample("coin", dist.Bernoulli(torch.tensor(0.5))).item())
6ANSWER = [flip, flip, flip]
7

02answer overlay — webppl vs pyrodist/finite

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0550 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.020, 0.020] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0200 ≤ tol 0.1100 · floors 0.0300/0.0550

★ feedback on this problem

probmods2-generative-models / ex2.c

answer dist/finite solver accept pyro pass 0.0450

00 statement source: exercises/generative-models.md

given

A function maps each integer argument to an independent fair-coin toss with probability 0.5 of heads, but the outcome for any given argument is fixed once determined — calling the function twice with the same argument always returns the same boolean.

model

Three boolean values are generated: the first and second by calling the function with the same argument, the third by calling it with a different argument.

query

The marginal distribution over the resulting list of three booleans. The program is evaluated once per seed; results are pooled across seeds to estimate the distribution.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "protocol": "draws",
  "support": [
    [
      true,
      true,
      true
    ],
    [
      true,
      true,
      false
    ],
    [
      false,
      false,
      true
    ],
    [
      false,
      false,
      false
    ]
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var foo = mem(function(x) { return flip(); });
2var ANSWER = ([foo(0), foo(0), foo(1)]);

◆realization0.045

python

1
2# probmods2-generative-models/ex2.c
3# foo is a memoized coin: foo(x) ~ Bernoulli(0.5), fixed per argument x.
4# Return [foo(0), foo(0), foo(1)] for one execution (draws protocol).
5
6_memo = {}
7def foo(x):
8    if x not in _memo:
9        _memo[x] = bool(pyro.sample(f"foo_{x}", dist.Bernoulli(0.5)).item())
10    return _memo[x]
11
12ANSWER = [foo(0), foo(0), foo(1)]
13

02answer overlay — webppl vs pyrodist/finite

webppl pyro4 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0550 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.050, 0.050] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0450 ≤ tol 0.1700 · floors 0.0850/0.0550

★ feedback on this problem

probmods2-generative-models / ex4.b

answer dist/finite solver accept pyro pass 0.0033

00 statement source: exercises/generative-models.md

given

A person has allergies independently with probability 0.3. A person has a cold independently with probability 0.2. Allergies and cold are independent of each other.

model

A person sneezes if they have a cold or have allergies (logical OR). A person has a fever if and only if they have a cold.

query

The joint distribution over whether the person sneezes and whether they have a fever.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "labels": {
    "record": {
      "sneeze": "bool",
      "fever": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (Infer({method: "enumerate"}, function() {
2  var allergies = flip(0.3);
3  var cold = flip(0.2);
4  var sneeze = cold || allergies;
5  var fever = cold;
6  return {sneeze: sneeze, fever: fever};
7}));

◆realization0.003

python

1@pyro.infer.config_enumerate
2def model():
3    allergies = pyro.sample("allergies", dist.Bernoulli(0.3)).bool()
4    cold = pyro.sample("cold", dist.Bernoulli(0.2)).bool()
5    sneeze = cold | allergies
6    fever = cold
7    return sneeze, fever
8
9serving = pyro.infer.infer_discrete(
10    pyro.infer.config_enumerate(model), first_available_dim=-1
11)
12
13counts = Counter()
14N = 20000
15for _ in range(N):
16    sneeze, fever = serving()
17    key = json.dumps({"sneeze": bool(sneeze.item()), "fever": bool(fever.item())}, sort_keys=True)
18    counts[key] += 1
19
20ANSWER = {k: v / N for k, v in counts.items()}
21

02answer overlay — webppl vs pyrodist/finite

webppl pyro3 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0033 ≤ tol 0.0181 · floors 0.0091/0.0000

★ feedback on this problem

probmods2-generative-models / ex4.c

answer dist/finite solver accept pyro pass 0.0000

00 statement source: exercises/generative-models.md

given

Each person independently has allergies with probability 0.3 and a cold with probability 0.2; these are independent, and each person's disease state is consistent throughout a single scenario (allergies and cold are person-level traits, not re-sampled per query).

model

A person sneezes if they have a cold or have allergies (logical OR). A person has a fever if and only if they have a cold. Bob's symptoms are evaluated using his consistent disease state.

query

The joint distribution over whether Bob sneezes and whether Bob has a fever.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "labels": {
    "record": {
      "sneeze": "bool",
      "fever": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (Infer({method: "enumerate"}, function() {
2  var allergies = mem(function(person) { return flip(.3); });
3  var cold = mem(function(person) { return flip(.2); });
4  var sneeze = function(person) { return cold(person) || allergies(person); };
5  var fever = function(person) { return cold(person); };
6  return {sneeze: sneeze('bob'), fever: fever('bob')};
7}));

◆realization0.000

python

1# Each person has allergies (p=0.3) and a cold (p=0.2), independent. Bob sneezes
2# if cold OR allergies; has a fever iff cold. No conditioning -- the queried
3# joint distribution is the prior over (sneeze, fever). The two binary latents are
4# enumerated and the joint over the derived (sneeze, fever) outcome is read off a
5# single enumerated outcome site by exact marginalization.
6
7# Encode the joint outcome (sneeze, fever) as one categorical latent so that
8# exact enumeration over (allergies, cold) yields its marginal directly. Outcome
9# index = sneeze*2 + fever, with outcomes ordered below.
10outcomes = [(False, False), (False, True), (True, False), (True, True)]
11
12
13@pyro.infer.config_enumerate
14def model():
15    allergies = pyro.sample("allergies", dist.Bernoulli(0.3))
16    cold = pyro.sample("cold", dist.Bernoulli(0.2))
17    # Tensor-valued under enumeration: keep everything in torch so the derived
18    # (sneeze, fever) pair is computed per enumeration cell.
19    sneeze = ((cold > 0) | (allergies > 0)).long()       # 0/1, broadcasts
20    fever = (cold > 0).long()
21    out_idx = sneeze * 2 + fever                          # in {0,1,2,3}
22    # One-hot logits over the 4 outcomes selecting the derived outcome per cell.
23    labels = torch.arange(4)
24    onehot = torch.where(
25        out_idx.unsqueeze(-1) == labels,
26        torch.tensor(0.0),
27        torch.tensor(float("-inf")),
28    )
29    return pyro.sample("outcome", dist.Categorical(logits=onehot))
30
31
32marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(
33    model, lambda: None
34)
35out_marg = marg["outcome"]
36# Record-labeled finite outcomes: each key is the JSON object {sneeze, fever};
37# the harness parses these keys back into the labeled record.
38ANSWER = {
39    json.dumps({"sneeze": outcomes[i][0], "fever": outcomes[i][1]}):
40        torch.exp(out_marg.log_prob(torch.tensor(i))).item()
41    for i in range(len(outcomes))
42}
43

02answer overlay — webppl vs pyrodist/finite

webppl pyro3 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-generative-models / ex5.b

answer dist/finite solver accept pyro pass 0.0040

00 statement source: exercises/generative-models.md

given

A fair coin has weight 0.5 (equal probability of heads or tails). A bent coin is derived from a fair coin as follows: if the fair coin shows heads, the bent coin flips a new coin with weight 0.7; if the fair coin shows tails, the bent coin flips a new coin with weight 0.1. Inference uses forward sampling with 10000 samples.

model

Each toss of the bent coin draws a result from the fair coin; based on that result it draws a second coin with a higher or lower bias and returns that second coin's result.

query

The marginal distribution over outcomes of a single toss of the bent coin, estimated by forward sampling with 10000 samples. Represent the outcome as the string 'h' for heads and 't' for tails.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "h",
    "t"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var makeCoin = function(weight) {
2  return function() {
3    return flip(weight) ? 'h' : 't';
4  };
5};
6var bend = function(coin) {
7  return function() {
8    return coin() == 'h' ? makeCoin(.7)() : makeCoin(.1)();
9  };
10};
11
12var fairCoin = makeCoin(.5);
13var bentCoin = bend(fairCoin);
14var ANSWER = (Infer({method: 'forward', samples: 10000}, bentCoin));

◆realization0.004

python

1# Bent coin: a fair coin (0.5) selects a second coin (0.7 if heads, 0.1 if tails);
2# return the second coin's result.  Forward sampling, 10000 samples, over {h,t}.
3
4def bent_coin():
5    fair = pyro.sample("fair", dist.Bernoulli(0.5))
6    weight = 0.7 if bool(fair.item()) else 0.1
7    second = pyro.sample("second", dist.Bernoulli(weight))
8    return "h" if bool(second.item()) else "t"
9
10num_samples = 10000
11outcomes = [bent_coin() for _ in range(num_samples)]
12counts = Counter(outcomes)
13ANSWER = {
14    "h": counts["h"] / num_samples,
15    "t": counts["t"] / num_samples,
16}
17

02answer overlay — webppl vs pyrodist/finite

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0085 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.003, 0.003] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0040 ≤ tol 0.0254 · floors 0.0127/0.0085

★ feedback on this problem

probmods2-generative-models / ex6.b

answer dist/int solver accept pyro pass 0.0277

00 statement source: exercises/generative-models.md

given

A fair coin is flipped at each step (probability 0.5 heads).

model

A non-negative integer is generated recursively: with probability 0.5 the value is 0; otherwise the value is 1 plus an independent draw from the same process. This defines a geometric distribution on the non-negative integers with success probability 0.5.

query

The empirical distribution over the recursively generated integer, estimated by 10000 independent forward samples.

answer spec dist/int

{
  "kind": "dist",
  "domain": "int",
  "protocol": "object"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var geometric = function() {
2  return flip() ? 0 : 1 + geometric();
3};
4var ANSWER = (Infer({method: "forward", samples:10000}, geometric));

◆realization0.028

python

1# Recursive geometric process: each step a fair coin decides 0 vs 1 + recurse.
2def geometric(trial):
3    if bool(pyro.sample(f"f{trial}_0", dist.Bernoulli(0.5))):
4        return 0
5    n = 1
6    i = 1
7    while True:
8        if bool(pyro.sample(f"f{trial}_{i}", dist.Bernoulli(0.5))):
9            return n
10        n += 1
11        i += 1
12
13
14# Empirical distribution from 10000 independent forward samples.
15counts = Counter()
16for s in range(10000):
17    counts[geometric(s)] += 1
18
19total = sum(counts.values())
20ANSWER = {k: counts[k] / total for k in counts}
21

02answer overlay — webppl vs pyrodist/int

webppl pyro13 bins · 0 … 13

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0441 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.026, 0.026] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0277 ≤ tol 0.0882 · floors 0.0371/0.0441

★ feedback on this problem

probmods2-generative-models / ex7.a

answer dist/finite solver accept pyro pass 0.0350

00 statement source: exercises/generative-models.md

given

The joint distribution over two Boolean random variables A and B is given by the following table: | A | B | P(A,B) | |---|---|--------| | F | F | 0.14 | | F | T | 0.06 | | T | F | 0.40 | | T | T | 0.40 |

model

A and B are jointly distributed according to the table above. One natural factorization fixes the marginal of A first, then draws B from the conditional distribution of B given A.

query

A single draw from the joint distribution of (A, B). The program returns one pair per run; collect multiple seeded runs to form the empirical joint distribution. Represent the draw as a two-element list: the outcome of A first, then the outcome of B (booleans).

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "protocol": "draws",
  "support": [
    [
      true,
      true
    ],
    [
      true,
      false
    ],
    [
      false,
      true
    ],
    [
      false,
      false
    ]
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var a = flip(0.8);
2var b = flip(a ? 0.5 : 0.3);
3var ANSWER = ([a, b]);

◆realization0.035

python

1# One draw from the joint of (A, B). Factorization: P(A) then P(B | A).
2# Table: P(A=T)=0.40+0.40=0.80; P(B=T|A=T)=0.40/0.80=0.5; P(B=T|A=F)=0.06/0.20=0.3.
3a = pyro.sample("a", dist.Bernoulli(0.8)).item() > 0
4p_b = 0.5 if a else 0.3
5b = pyro.sample("b", dist.Bernoulli(p_b)).item() > 0
6ANSWER = [bool(a), bool(b)]
7

02answer overlay — webppl vs pyrodist/finite

webppl pyro4 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0600 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.035, 0.035] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0350 ≤ tol 0.1800 · floors 0.0450/0.0600

★ feedback on this problem

probmods2-generative-models / ex7.b

answer dist/finite solver accept pyro pass 0.0082

00 statement source: exercises/generative-models.md

given

Two Boolean random variables A and B have the following distribution: P(A=true) = 0.8; P(B=true | A=true) = 0.5; P(B=true | A=false) = 0.3.

model

A is drawn from its marginal; B is then drawn conditionally on A.

query

The full joint distribution over (A, B), estimated by forward sampling with 10000 samples. Represent each outcome pair as a two-element list: the outcome of A first, then the outcome of B (booleans).

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    [
      true,
      true
    ],
    [
      true,
      false
    ],
    [
      false,
      true
    ],
    [
      false,
      false
    ]
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var ANSWER = (Infer({method: "forward", samples: 10000}, function() {
2  var a = flip(0.8);
3  var b = flip(a ? 0.5 : 0.3);
4  return [a, b];
5}));

◆realization0.008

python

1# probmods2-generative-models/ex7.b
2# A ~ Bernoulli(0.8); B | A ~ Bernoulli(0.5 if A else 0.3).
3# Full joint over (A, B) estimated by forward sampling (prior) with 10000 draws.
4
5N = 10000
6
7def model():
8    with pyro.plate("draws", N):
9        a = pyro.sample("a", dist.Bernoulli(0.8))
10        p_b = torch.where(a == 1.0, torch.tensor(0.5), torch.tensor(0.3))
11        b = pyro.sample("b", dist.Bernoulli(p_b))
12    return a, b
13
14a, b = model()
15a_bool = a.bool()
16b_bool = b.bool()
17
18support = [(True, True), (True, False), (False, True), (False, False)]
19counts = {pair: 0 for pair in support}
20for i in range(N):
21    pair = (bool(a_bool[i].item()), bool(b_bool[i].item()))
22    counts[pair] += 1
23
24ANSWER = {pair: counts[pair] / N for pair in support}
25

02answer overlay — webppl vs pyrodist/finite

webppl pyro4 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0179 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.007, 0.007] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0082 ≤ tol 0.0358 · floors 0.0116/0.0179

★ feedback on this problem

probmods2-hierarchical-models / ex1

answer record(observed, usealpha) solver accept pyro pass 0.0112

00 statement source: exercises/hierarchical-models.md

given

There are five colors: black, blue, green, orange, red. In the `observed` model, the Dirichlet concentration vector is all-ones (length 5), and the observed data are three draws from bag1: blue, blue, black (in that order). In the `usealpha` model, the Dirichlet concentration vector for each bag is [2, 3, 1, 1, 1] in the order (black, blue, green, orange, red), with no observed data.

model

Each bag's color distribution is drawn independently from a Dirichlet prior parameterized by a concentration vector. Draws from a bag are conditionally independent given that bag's color distribution. Each model is run with MCMC using 20000 samples.

query

Return a record with two fields: `observed` — the posterior predictive distribution over a single color draw from bag1 under the all-ones-prior model conditioned on the three observed draws; `usealpha` — the prior predictive distribution over a single color draw from bag1 under the [2,3,1,1,1]-concentration model with no observations.

answer spec record(observed, usealpha)

{
  "kind": "record",
  "fields": {
    "observed": {
      "kind": "dist",
      "domain": "finite",
      "support": [
        "black",
        "blue",
        "green",
        "orange",
        "red"
      ]
    },
    "usealpha": {
      "kind": "dist",
      "domain": "finite",
      "support": [
        "black",
        "blue",
        "green",
        "orange",
        "red"
      ]
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var colors = ['black', 'blue', 'green', 'orange', 'red'];
2var observedData = [{bag: 'bag1', draw: 'blue'},
3                    {bag: 'bag1', draw: 'blue'},
4                    {bag: 'bag1', draw: 'black'}];
5
6var observed = Infer({method: 'MCMC', samples: 20000}, function() {
7  var makeBag = mem(function(bag) {
8    var colorProbs = dirichlet(ones([colors.length, 1]));
9    return Categorical({vs: colors, ps: colorProbs});
10  });
11  var obsFn = function(datum) { observe(makeBag(datum.bag), datum.draw); };
12  mapData({data: observedData}, obsFn);
13  return sample(makeBag('bag1'));
14});
15
16var usealpha = Infer({method: 'MCMC', samples: 20000}, function () {
17  var makeBag = mem(function(bag) {
18    var colorProbs = dirichlet(Vector([2, 3, 1, 1, 1]));
19    return Categorical({vs: colors, ps: colorProbs});
20  });
21  return sample(makeBag('bag1'));
22});
23var ANSWER = (({observed: observed, usealpha: usealpha}));
24

◆realization0.011

python

1colors = ['black', 'blue', 'green', 'orange', 'red']
2
3# observed: all-ones Dirichlet prior on bag1's color distribution, conditioned on
4# three draws (blue, blue, black). The model draws colorProbs ~ Dirichlet, observes
5# the three categorical draws, and returns a fresh draw -> posterior predictive.
6# This is a Dirichlet-Categorical posterior, so we sample the latent colorProbs via
7# MCMC over the model and average the predictive categorical.
8
9def observed_model():
10    alpha = torch.ones(5)
11    colorProbs = pyro.sample('colorProbs', dist.Dirichlet(alpha))
12    counts = {'blue': 2, 'black': 1}
13    for c, n in counts.items():
14        idx = colors.index(c)
15        with pyro.plate('obs_' + c, n):
16            pyro.sample('d_' + c, dist.Categorical(colorProbs),
17                        obs=torch.full((n,), idx, dtype=torch.long))
18    return colorProbs
19
20
21mcmc_obs = pyro.infer.MCMC(pyro.infer.NUTS(observed_model), num_samples=800, warmup_steps=400)
22mcmc_obs.run()
23probs_obs = mcmc_obs.get_samples()['colorProbs'].mean(0)
24observed = {c: float(probs_obs[i]) for i, c in enumerate(colors)}
25
26# usealpha: prior predictive under Dirichlet([2,3,1,1,1]) (black,blue,green,orange,red),
27# no observations. Express the model through pyro.sample: draw colorProbs from the
28# Dirichlet prior and a predictive color draw from Categorical(colorProbs), then run
29# it under Importance (no conditioning -> forward sampling) and read the predictive
30# site's EmpiricalMarginal.
31alpha2 = torch.tensor([2.0, 3.0, 1.0, 1.0, 1.0])
32
33def usealpha_model():
34    colorProbs = pyro.sample('colorProbs_ua', dist.Dirichlet(alpha2))
35    draw = pyro.sample('draw_ua', dist.Categorical(colorProbs))
36    return draw
37
38posterior_ua = pyro.infer.Importance(usealpha_model, num_samples=20000).run()
39marg_ua = pyro.infer.EmpiricalMarginal(posterior_ua, sites='draw_ua')
40ua_samps = torch.stack([marg_ua.sample() for _ in range(20000)])
41ua_counts = torch.zeros(5)
42for i in range(5):
43    ua_counts[i] = (ua_samps == i).sum()
44probs_ua = ua_counts / ua_counts.sum()
45usealpha = {c: float(probs_ua[i]) for i, c in enumerate(colors)}
46
47ANSWER = {'observed': observed, 'usealpha': usealpha}
48

02answer overlay — webppl vs pyrorecord(observed, usealpha)

observed

webppl pyro5 bins

usealpha

webppl pyro5 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0274 (record)
solver re-derivation	accept	2/2 solvers · d=[0.021, 0.033] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0112 ≤ tol 0.0548 · floors 0.0211/0.0274

★ feedback on this problem

probmods2-hierarchical-models / ex2.1

answer dist/int solver accept pyro pass 0.1792

00 statement source: exercises/hierarchical-models.md

given

Each apple in a barrel is independently rotten with probability p, where p is drawn from Beta(a=0.1, b=0.2).

model

Each barrel has its own rottenness probability drawn once from a Beta(0.1, 0.2) prior. Given that probability, each apple in the barrel is independently rotten or fresh.

query

The marginal distribution over the total number of rotten apples in a barrel of 10 apples, integrating over the barrel's rottenness probability. Use forward sampling.

answer spec dist/int

{
  "kind": "dist",
  "domain": "int"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var makeBarrel = mem(function(barrelName) {
2  var pRotten = beta({a: .1, b: .2});
3  var barrel = function(n) {
4    return repeat(n, function() { flip(pRotten) });
5  };
6  return barrel;
7});
8var ANSWER = (Infer({method: 'forward'}, function() {
9  var barrel = makeBarrel('barrel');
10  return Math.sum(barrel(10));
11}));

◆realization0.179

python

1def model():
2    p_rotten = pyro.sample("pRotten", dist.Beta(0.1, 0.2))
3    apples = pyro.sample("apples", dist.Bernoulli(p_rotten.expand([10])).to_event(1))
4    total = apples.sum()
5    return pyro.deterministic("total", total)
6
7# Forward sampling (no conditioning): draw from the prior and aggregate.
8post = pyro.infer.Importance(model, num_samples=4000).run()
9ANSWER = pyro.infer.EmpiricalMarginal(post, "total")
10

02answer overlay — webppl vs pyrodist/int

webppl pyro11 bins · 0 … 10

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.2100 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.194, 0.194] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.1792 ≤ tol 0.4200 · floors 0.1517/0.2100

★ feedback on this problem

probmods2-hierarchical-models / ex2.2

answer record(sameStore, differentStore) solver accept pyro pass 0.0402

00 statement source: exercises/hierarchical-models.md

given

Each store independently draws its type from a 50/50 mixture: fresh stores use Beta(a=0.1, b=0.3) for apple rottenness probability; rotten stores use Beta(a=0.3, b=0.1). All barrels from the same store share that store's type. Each barrel within a store draws its own rottenness probability from the store's Beta. Each apple in a barrel is independently rotten or fresh with the barrel's rottenness probability.

model

A two-level hierarchy: a store's type is drawn once from a 50/50 prior, determining the Beta distribution from which each of the store's barrels draws its rottenness probability. Given the barrel's probability, each apple is independently rotten or fresh.

query

Two distributions over the absolute difference in rotten count between two barrels of 10 apples each, estimated with 10000 forward samples each: one where both barrels come from the same store, and one where the barrels come from two different stores.

answer spec record(sameStore, differentStore)

{
  "kind": "record",
  "fields": {
    "sameStore": {
      "kind": "dist",
      "domain": "int"
    },
    "differentStore": {
      "kind": "dist",
      "domain": "int"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var makeStore = mem(function(storeName) {
2  var storePrior = flip() ? {a: .1, b: .3} : {a: .3, b: .1};
3  var makeBarrel = mem(function(barrelName) {
4    var pRotten = beta(storePrior);
5    var barrel = function(n) {
6      return repeat(n, function() { flip(pRotten) });
7    };
8    return barrel;
9  });
10  return makeBarrel;
11});
12var ANSWER = (({
13  sameStore: Infer({method: 'forward', samples: 10000}, function() {
14    var S = makeStore('S');
15    var B1 = S('B1');
16    var B2 = S('B2');
17    return Math.abs(Math.sum(B1(10)) - Math.sum(B2(10)));
18  }),
19  differentStore: Infer({method: 'forward', samples: 10000}, function() {
20    var S1 = makeStore('S1');
21    var S2 = makeStore('S2');
22    var B1 = S1('B1');
23    var B2 = S2('B2');
24    return Math.abs(Math.sum(B1(10)) - Math.sum(B2(10)));
25  })
26}));

◆realization0.040

python

1
2# Two store types, 50/50: fresh -> Beta(0.1,0.3); rotten -> Beta(0.3,0.1).
3# All barrels in a store share the store type; each barrel draws its own
4# pRotten from the store's Beta; each of 10 apples is Bernoulli(pRotten).
5# Query: forward-sampled distribution of |sum(B1) - sum(B2)|, 10000 samples,
6#   (a) both barrels from the SAME store, (b) barrels from DIFFERENT stores.
7# No conditioning -> pure forward simulation of the generative model.
8
9n_samples = 10000
10
11def barrel_count(name, a, b):
12    # one barrel: draw pRotten ~ Beta(a,b), then 10 apples ~ Bernoulli(pRotten)
13    p_rotten = pyro.sample(f"{name}_p", dist.Beta(torch.tensor(a), torch.tensor(b)))
14    apples = pyro.sample(f"{name}_apples",
15                         dist.Bernoulli(p_rotten).expand([10]).to_event(1))
16    return int(apples.sum().item())
17
18def same_store_model():
19    # both barrels share one store's type
20    fresh = pyro.sample("store_fresh", dist.Bernoulli(0.5))
21    a, b = (0.1, 0.3) if fresh.item() > 0.5 else (0.3, 0.1)
22    c1 = barrel_count("B1", a, b)
23    c2 = barrel_count("B2", a, b)
24    return abs(c1 - c2)
25
26def different_store_model():
27    fresh1 = pyro.sample("store1_fresh", dist.Bernoulli(0.5))
28    a1, b1 = (0.1, 0.3) if fresh1.item() > 0.5 else (0.3, 0.1)
29    fresh2 = pyro.sample("store2_fresh", dist.Bernoulli(0.5))
30    a2, b2 = (0.1, 0.3) if fresh2.item() > 0.5 else (0.3, 0.1)
31    c1 = barrel_count("B1", a1, b1)
32    c2 = barrel_count("B2", a2, b2)
33    return abs(c1 - c2)
34
35def forward_dist(model_fn):
36    counts = Counter()
37    for _ in range(n_samples):
38        val = model_fn()
39        counts[int(val)] += 1
40    total = sum(counts.values())
41    return {k: v / total for k, v in counts.items()}
42
43ANSWER = {
44    "sameStore": forward_dist(same_store_model),
45    "differentStore": forward_dist(different_store_model),
46}
47

02answer overlay — webppl vs pyrorecord(sameStore, differentStore)

sameStore

webppl pyro11 bins · 0 … 10

differentStore

webppl pyro11 bins · 0 … 10

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0705 (record)
solver re-derivation	accept	2/2 solvers · d=[0.033, 0.084] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0402 ≤ tol 0.2920 · floors 0.1460/0.0705

★ feedback on this problem

probmods2-hierarchical-models / ex2.3

answer dist/int solver accept pyro pass 0.4522

00 statement source: exercises/hierarchical-models.md

given

A three-level hierarchy of cities, stores, and barrels. Each city has a probability p_city drawn from Beta(a=0.25, b=0.25) that a store is the fresh type. A fresh-type store uses Beta(a=0.1, b=0.3) for apple rottenness probability; a rotten-type store uses Beta(a=0.3, b=0.1). Each barrel within a store draws its own rottenness probability from the store's Beta. Each apple in a barrel is independently rotten or fresh with the barrel's rottenness probability.

model

A three-level Bayesian hierarchy: city-level type probability determines the store's type, which determines the distribution from which each barrel's rottenness probability is drawn, which determines whether each apple is rotten.

query

The marginal distribution over the total number of rotten apples in a 20-apple barrel drawn from one store in one city, integrating over all three levels of the hierarchy. Use forward sampling.

answer spec dist/int

{
  "kind": "dist",
  "domain": "int"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var makeCity = mem(function(cityName){
2  var cityPrior = beta({a: .25, b: .25});
3  var makeStore = mem(function(storeName) {
4    var storePrior = flip(cityPrior) ? {a: .1, b: .3} : {a: .3, b: .1};
5    var makeBarrel = mem(function(barrelName) {
6      var pRotten = beta(storePrior);
7      var barrel = function(n) {
8        return repeat(n, function() { flip(pRotten) });
9      };
10      return barrel;
11    });
12    return makeBarrel;
13  });
14  return makeStore;
15});
16
17var ANSWER = (Infer({method: 'forward'}, function(){
18    var C1 = makeCity("C1");
19    var S1 = C1("S1");
20    var B1 = S1("B1");
21    return Math.sum(B1(20));
22}));

◆realization0.452

python

1def model():
2    city_prior = pyro.sample("cityPrior", dist.Beta(0.25, 0.25))
3    is_fresh = pyro.sample("storeType", dist.Bernoulli(city_prior))
4    a = torch.where(is_fresh.bool(), torch.tensor(0.1), torch.tensor(0.3))
5    b = torch.where(is_fresh.bool(), torch.tensor(0.3), torch.tensor(0.1))
6    p_rotten = pyro.sample("pRotten", dist.Beta(a, b))
7    apples = pyro.sample("apples", dist.Bernoulli(p_rotten.expand([20])).to_event(1))
8    total = apples.sum()
9    return pyro.deterministic("total", total)
10
11# Forward sampling over all three hierarchy levels (no conditioning).
12post = pyro.infer.Importance(model, num_samples=4000).run()
13ANSWER = pyro.infer.EmpiricalMarginal(post, "total")
14

02answer overlay — webppl vs pyrodist/int

webppl pyro21 bins · 0 … 20

03 verification

check	status	evidence
GT self-consistency	ok	floor 1.4000 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.401, 0.431] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.4522 ≤ tol 2.8000 · floors 0.4063/1.4000

★ feedback on this problem

probmods2-hierarchical-models / ex2.4

answer dist/int solver accept pyro pass 0.3331

00 statement source: exercises/hierarchical-models.md

given

A three-level hierarchy as in the previous exercise: each city has p_city drawn from Beta(a=0.25, b=0.25); stores within a city are fresh type (Beta(a=0.1, b=0.3)) with probability p_city, else rotten type (Beta(a=0.3, b=0.1)); each barrel in a store draws its own rottenness probability from the store's Beta; apples are independently rotten given the barrel's probability. You observe a first barrel of 10 apples from one store in a city, and 7 of those apples are rotten.

model

Condition the three-level hierarchy on the observation. Infer the posterior over the number of rotten apples in a 10-apple barrel from a different store in the same city, given the observation from the first store's barrel.

query

The posterior distribution over the number of rotten apples in a 10-apple barrel from the second store, conditioned on 7 of 10 apples being rotten in the first store's barrel. Use MCMC with 5000 samples and a lag of 100.

answer spec dist/int

{
  "kind": "dist",
  "domain": "int"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var makeCity = mem(function(cityName){
2    var cityPrior = beta({a: .25, b: .25});
3
4    var makeStore = mem(function(storeName) {
5        var storePrior = flip(cityPrior) ? {a: .1, b: .3} : {a: .3, b: .1};
6
7        var makeBarrel = mem(function(barrelName) {
8            var pRotten = beta(storePrior);
9            var barrel = function(n) {
10                return repeat(n, function() { flip(pRotten) });
11            };
12            return barrel;
13        });
14
15        return makeBarrel;
16    });
17
18    return makeStore;
19});
20var ANSWER = (Infer({method: 'MCMC', samples:5000, lag: 100}, function(){
21    var C = makeCity("C");
22    var S1 = C("S1");
23    var B1 = S1("B1");
24    var S2 = C("S2");
25    var B2 = S2("B2");
26
27    condition(Math.sum(B1(10)) == 7);
28
29    return Math.sum(B2(10));
30}));

◆realization0.333

python

1
2# Mixed discrete (store-type flips) + continuous (cityPrior, pRotten). The discrete
3# flips are concrete samples under Importance (no enumeration needed); the obs1=7
4# Binomial likelihood is not extreme, so Importance over the prior recovers the
5# posterior. Query = posterior-predictive over the 2nd barrel's rotten count.
6def store_params(flip):
7    return (0.1, 0.3) if bool(flip.item()) else (0.3, 0.1)   # fresh : rotten
8def model():
9    cityPrior = pyro.sample("cityPrior", dist.Beta(0.25, 0.25))
10    f1 = pyro.sample("f1", dist.Bernoulli(cityPrior))
11    a1, b1 = store_params(f1)
12    pRotten1 = pyro.sample("pRotten1", dist.Beta(a1, b1))
13    pyro.sample("obs1", dist.Binomial(10, pRotten1), obs=torch.tensor(7.0))
14    f2 = pyro.sample("f2", dist.Bernoulli(cityPrior))
15    a2, b2 = store_params(f2)
16    pRotten2 = pyro.sample("pRotten2", dist.Beta(a2, b2))
17    return pyro.sample("B2", dist.Binomial(10, pRotten2))
18post = pyro.infer.Importance(model, num_samples=40000).run()
19ANSWER = pyro.infer.EmpiricalMarginal(post)
20

02answer overlay — webppl vs pyrodist/int

webppl pyro11 bins · 0 … 10

03 verification

check	status	evidence
GT self-consistency	ok	floor 1.1702 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.306, 0.364] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.3331 ≤ tol 2.3404 · floors 0.1421/1.1702

★ feedback on this problem

probmods2-hierarchical-models / ex3.1

answer dist/real solver accept pyro pass 1.1237

00 statement source: exercises/hierarchical-models.md

given

Reading-time data: 24 observations across 6 words in two groups. Vowel-initial words: abacus (ids 1,2,3 with rts 210,212,209), aardvark (ids 1,2,3 with rts 200,201,198), ellipse (ids 1,2,3 with rts 220,222,219). Consonant-initial words: proton (ids 1,2,3 with rts 190,191,189), folder (ids 1,2,3 with rts 180,182,178), fedora (three replicates: ids 1,2,3 with rts 230,231,228; then 231,233,230; then 230,232,228). Each group's mean reading time has a Gaussian(200, 100) prior. Each word's mean reading time is drawn from a Gaussian centered at its group's mean with standard deviation 20. Each observed reading time is drawn from a Gaussian centered at the word's mean with standard deviation 10.

model

A two-level hierarchical Gaussian model: group-level mean reading times are latent, word-level means are drawn from the group mean, and observed reading times are drawn from the word mean.

query

The posterior distribution over the difference in group mean reading time (vowel minus consonant), given all observations. Use MCMC with 5000 samples, a burn-in of 10000, and a lag of 5.

answer spec dist/real

{
  "kind": "dist",
  "domain": "real"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var data = [{group: "vowel", word: "abacus", id: 1, rt: 210},
2            {group: "vowel", word: "abacus", id: 2, rt: 212},
3            {group: "vowel", word: "abacus", id: 3, rt: 209},
4            {group: "vowel", word: "aardvark", id: 1, rt: 200},
5            {group: "vowel", word: "aardvark", id: 2, rt: 201},
6            {group: "vowel", word: "aardvark", id: 3, rt: 198},
7            {group: "vowel", word: "ellipse", id: 1, rt: 220},
8            {group: "vowel", word: "ellipse", id: 2, rt: 222},
9            {group: "vowel", word: "ellipse", id: 3, rt: 219},
10            {group: "consonant", word: "proton", id: 1, rt: 190},
11            {group: "consonant", word: "proton", id: 2, rt: 191},
12            {group: "consonant", word: "proton", id: 3, rt: 189},
13            {group: "consonant", word: "folder", id: 1, rt: 180},
14            {group: "consonant", word: "folder", id: 2, rt: 182},
15            {group: "consonant", word: "folder", id: 3, rt: 178},
16            {group: "consonant", word: "fedora", id: 1, rt: 230},
17            {group: "consonant", word: "fedora", id: 2, rt: 231},
18            {group: "consonant", word: "fedora", id: 3, rt: 228},
19            {group: "consonant", word: "fedora", id: 1, rt: 231},
20            {group: "consonant", word: "fedora", id: 2, rt: 233},
21            {group: "consonant", word: "fedora", id: 3, rt: 230},
22            {group: "consonant", word: "fedora", id: 1, rt: 230},
23            {group: "consonant", word: "fedora", id: 2, rt: 232},
24            {group: "consonant", word: "fedora", id: 3, rt: 228}];
25
26var opts = {method: "MCMC", burn: 10000, lag: 5, samples: 5000};
27var ANSWER = (Infer(opts, function() {
28  var groupMeans = {vowel: gaussian(200, 100),
29                    consonant: gaussian(200, 100)};
30
31  var wordMean = mem(function(word, group) {
32    return gaussian(groupMeans[group], 20);
33  });
34
35  var obsFn = function(d) {
36    observe(Gaussian({mu: wordMean(d.word, d.group),
37                      sigma: 10}), d.rt);
38  };
39
40  mapData({data: data}, obsFn);
41
42  return groupMeans['vowel'] - groupMeans['consonant'];
43}));

◆realization1.124

python

1# Two-level hierarchical Gaussian. Group means ~ Normal(200,100); word means ~
2# Normal(groupMean, 20); observed rt ~ Normal(wordMean, 10). All latents continuous,
3# so NUTS. Query: posterior over (vowel groupMean - consonant groupMean).
4
5data = [
6    ("vowel", "abacus", 210.0), ("vowel", "abacus", 212.0), ("vowel", "abacus", 209.0),
7    ("vowel", "aardvark", 200.0), ("vowel", "aardvark", 201.0), ("vowel", "aardvark", 198.0),
8    ("vowel", "ellipse", 220.0), ("vowel", "ellipse", 222.0), ("vowel", "ellipse", 219.0),
9    ("consonant", "proton", 190.0), ("consonant", "proton", 191.0), ("consonant", "proton", 189.0),
10    ("consonant", "folder", 180.0), ("consonant", "folder", 182.0), ("consonant", "folder", 178.0),
11    ("consonant", "fedora", 230.0), ("consonant", "fedora", 231.0), ("consonant", "fedora", 228.0),
12    ("consonant", "fedora", 231.0), ("consonant", "fedora", 233.0), ("consonant", "fedora", 230.0),
13    ("consonant", "fedora", 230.0), ("consonant", "fedora", 232.0), ("consonant", "fedora", 228.0),
14]
15
16groups = ["vowel", "consonant"]
17words = sorted({(g, w) for (g, w, _) in data})
18
19def model():
20    groupMeans = {g: pyro.sample(f"group_{g}", dist.Normal(200.0, 100.0)) for g in groups}
21    wordMeans = {}
22    for (g, w) in words:
23        wordMeans[(g, w)] = pyro.sample(f"word_{g}_{w}", dist.Normal(groupMeans[g], 20.0))
24    for i, (g, w, rt) in enumerate(data):
25        pyro.sample(f"obs_{i}", dist.Normal(wordMeans[(g, w)], 10.0), obs=torch.tensor(rt))
26
27kernel = pyro.infer.NUTS(model)
28mcmc = pyro.infer.MCMC(kernel, num_samples=2000, warmup_steps=1000)
29mcmc.run()
30s = mcmc.get_samples()
31ANSWER = (s["group_vowel"] - s["group_consonant"]).tolist()
32

02answer overlay — webppl vs pyrodist/real

webppl pyro1443 bins · -43.2 … 64.1

03 verification

check	status	evidence
GT self-consistency	ok	floor 2.4346 (w1)
solver re-derivation	accept	2/2 solvers · d=[1.500, 1.500] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=1.1237 ≤ tol 4.8692 · floors 0.9237/2.4346

★ feedback on this problem

probmods2-hierarchical-models / ex3.2

answer dist/real solver accept pyro pass 1.4079

00 statement source: exercises/hierarchical-models.md

given

Reading-time experiment with 24 observations across two word-onset groups. Each observation has fields: group ("vowel" or "consonant"), word, id (participant 1, 2, or 3), and rt (reading time in ms). The 24 rows are: {group: "vowel", word: "abacus", id: 1, rt: 210}, {group: "vowel", word: "abacus", id: 2, rt: 212}, {group: "vowel", word: "abacus", id: 3, rt: 209}, {group: "vowel", word: "aardvark", id: 1, rt: 200}, {group: "vowel", word: "aardvark", id: 2, rt: 201}, {group: "vowel", word: "aardvark", id: 3, rt: 198}, {group: "vowel", word: "ellipse", id: 1, rt: 220}, {group: "vowel", word: "ellipse", id: 2, rt: 222}, {group: "vowel", word: "ellipse", id: 3, rt: 219}, {group: "consonant", word: "proton", id: 1, rt: 190}, {group: "consonant", word: "proton", id: 2, rt: 191}, {group: "consonant", word: "proton", id: 3, rt: 189}, {group: "consonant", word: "folder", id: 1, rt: 180}, {group: "consonant", word: "folder", id: 2, rt: 182}, {group: "consonant", word: "folder", id: 3, rt: 178}, {group: "consonant", word: "fedora", id: 1, rt: 230}, {group: "consonant", word: "fedora", id: 2, rt: 231}, {group: "consonant", word: "fedora", id: 3, rt: 228}, {group: "consonant", word: "fedora", id: 1, rt: 231}, {group: "consonant", word: "fedora", id: 2, rt: 233}, {group: "consonant", word: "fedora", id: 3, rt: 230}, {group: "consonant", word: "fedora", id: 1, rt: 230}, {group: "consonant", word: "fedora", id: 2, rt: 232}, {group: "consonant", word: "fedora", id: 3, rt: 228}. Priors: each group's mean reading time has a Gaussian(200, 100) prior. Each word's mean is drawn from a Gaussian centered at the group mean with sd=20. Each participant has an additive offset drawn from Gaussian(0, 2). Observed reading times are drawn from a Gaussian centered at the word mean plus participant offset with sd=10.

model

A three-level hierarchical Gaussian model: group-level means, word-level means centered on the group mean, and participant-level additive offsets. Each observed reading time is drawn from a Gaussian at the sum of the word mean and participant offset.

query

The posterior marginal distribution over the group mean difference (vowel minus consonant), integrating over word means and participant offsets. Use MCMC with 5000 samples, a burn-in of 10000, and a lag of 5.

answer spec dist/real

{
  "kind": "dist",
  "domain": "real"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var data = [{group: "vowel", word: "abacus", id: 1, rt: 210},
2            {group: "vowel", word: "abacus", id: 2, rt: 212},
3            {group: "vowel", word: "abacus", id: 3, rt: 209},
4            {group: "vowel", word: "aardvark", id: 1, rt: 200},
5            {group: "vowel", word: "aardvark", id: 2, rt: 201},
6            {group: "vowel", word: "aardvark", id: 3, rt: 198},
7            {group: "vowel", word: "ellipse", id: 1, rt: 220},
8            {group: "vowel", word: "ellipse", id: 2, rt: 222},
9            {group: "vowel", word: "ellipse", id: 3, rt: 219},
10            {group: "consonant", word: "proton", id: 1, rt: 190},
11            {group: "consonant", word: "proton", id: 2, rt: 191},
12            {group: "consonant", word: "proton", id: 3, rt: 189},
13            {group: "consonant", word: "folder", id: 1, rt: 180},
14            {group: "consonant", word: "folder", id: 2, rt: 182},
15            {group: "consonant", word: "folder", id: 3, rt: 178},
16            {group: "consonant", word: "fedora", id: 1, rt: 230},
17            {group: "consonant", word: "fedora", id: 2, rt: 231},
18            {group: "consonant", word: "fedora", id: 3, rt: 228},
19            {group: "consonant", word: "fedora", id: 1, rt: 231},
20            {group: "consonant", word: "fedora", id: 2, rt: 233},
21            {group: "consonant", word: "fedora", id: 3, rt: 230},
22            {group: "consonant", word: "fedora", id: 1, rt: 230},
23            {group: "consonant", word: "fedora", id: 2, rt: 232},
24            {group: "consonant", word: "fedora", id: 3, rt: 228}];
25
26var opts = {method: "MCMC", burn: 10000, lag: 5, samples: 5000};
27var joint = Infer(opts, function() {
28  var groupMeans = {vowel: gaussian(200, 100),
29                    consonant: gaussian(200, 100)};
30
31  var participantMean = mem(function(pid) {
32    return gaussian(0, 2);
33  });
34
35  var wordMean = mem(function(word, group) {
36    return gaussian(groupMeans[group], 20);
37  });
38
39  var obsFn = function(d) {
40    observe(Gaussian({mu: wordMean(d.word, d.group) + participantMean(d.id),
41                      sigma: 10}), d.rt);
42  };
43
44  mapData({data: data}, obsFn);
45
46  return {diff: groupMeans['vowel'] - groupMeans['consonant'],
47          p1: participantMean(1),
48          p2: participantMean(2),
49          p3: participantMean(3)};
50});
51var ANSWER = marginalize(joint, function(x) { return x.diff; });

◆realization1.408

python

1# Three-level hierarchical Gaussian: group means ~ Normal(200,100); word means ~
2# Normal(groupMean, 20); participant offsets ~ Normal(0, 2); observed rt ~
3# Normal(wordMean + participantOffset, 10). All latents continuous -> NUTS.
4# Query: posterior marginal over (vowel groupMean - consonant groupMean).
5
6data = [
7    ("vowel", "abacus", 1, 210.0), ("vowel", "abacus", 2, 212.0), ("vowel", "abacus", 3, 209.0),
8    ("vowel", "aardvark", 1, 200.0), ("vowel", "aardvark", 2, 201.0), ("vowel", "aardvark", 3, 198.0),
9    ("vowel", "ellipse", 1, 220.0), ("vowel", "ellipse", 2, 222.0), ("vowel", "ellipse", 3, 219.0),
10    ("consonant", "proton", 1, 190.0), ("consonant", "proton", 2, 191.0), ("consonant", "proton", 3, 189.0),
11    ("consonant", "folder", 1, 180.0), ("consonant", "folder", 2, 182.0), ("consonant", "folder", 3, 178.0),
12    ("consonant", "fedora", 1, 230.0), ("consonant", "fedora", 2, 231.0), ("consonant", "fedora", 3, 228.0),
13    ("consonant", "fedora", 1, 231.0), ("consonant", "fedora", 2, 233.0), ("consonant", "fedora", 3, 230.0),
14    ("consonant", "fedora", 1, 230.0), ("consonant", "fedora", 2, 232.0), ("consonant", "fedora", 3, 228.0),
15]
16
17groups = ["vowel", "consonant"]
18words = sorted({(g, w) for (g, w, _, _) in data})
19pids = sorted({p for (_, _, p, _) in data})
20
21def model():
22    groupMeans = {g: pyro.sample(f"group_{g}", dist.Normal(200.0, 100.0)) for g in groups}
23    participant = {p: pyro.sample(f"part_{p}", dist.Normal(0.0, 2.0)) for p in pids}
24    wordMeans = {(g, w): pyro.sample(f"word_{g}_{w}", dist.Normal(groupMeans[g], 20.0))
25                 for (g, w) in words}
26    for i, (g, w, p, rt) in enumerate(data):
27        mu = wordMeans[(g, w)] + participant[p]
28        pyro.sample(f"obs_{i}", dist.Normal(mu, 10.0), obs=torch.tensor(rt))
29
30kernel = pyro.infer.NUTS(model)
31mcmc = pyro.infer.MCMC(kernel, num_samples=2000, warmup_steps=1000)
32mcmc.run()
33s = mcmc.get_samples()
34ANSWER = (s["group_vowel"] - s["group_consonant"]).tolist()
35

02answer overlay — webppl vs pyrodist/real

webppl pyro1236 bins · -42.7 … 69.3

03 verification

check	status	evidence
GT self-consistency	ok	floor 3.7376 (w1)
solver re-derivation	accept	2/2 solvers · d=[2.061, 2.061] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=1.4079 ≤ tol 7.4751 · floors 0.7845/3.7376

★ feedback on this problem

probmods2-inference-algorithms / ex1.1

answer dist/real solver accept pyro unavailable

00 statement source: exercises/inference-algorithms.md

given

Heart-shaped implicit curve: a point (x, y) lies on the curve if |x^2 + (y - x^(2/3))^2 - 1| < 0.01. Priors: x ~ Gaussian(0, 1) and y ~ Gaussian(0.3, 1.3), where the means and standard deviations are the midpoints and half-widths of the bounding boxes [-1, 1] for x and [-1, 1.6] for y.

model

Draw x and y independently from their respective Gaussian priors and condition on the point lying on the heart curve.

query

The marginal posterior distribution over x, obtained by running MCMC with 10000 samples and lag 10 on the joint model and then marginalizing to x.

answer spec dist/real

{
  "kind": "dist",
  "domain": "real"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var onCurve = function(x, y) {
2  var x2 = x*x;
3  var term1 = y - Math.pow(x2, 1/3);
4  var crossSection = x2 + term1*term1 - 1;
5  return Math.abs(crossSection) < 0.01;
6};
7var xbounds = [-1, 1];
8var ybounds = [-1, 1.6];
9
10var xmu = 0.5 * (xbounds[0] + xbounds[1]);
11var ymu = 0.5 * (ybounds[0] + ybounds[1]);
12var xsigma = 0.5 * (xbounds[1] - xbounds[0]);
13var ysigma = 0.5 * (ybounds[1] - ybounds[0]);
14
15var model = function() {
16  var x = gaussian(xmu, xsigma);
17  var y = gaussian(ymu, ysigma);
18  condition(onCurve(x, y));
19  return {x: x, y: y};
20};
21var posterior = Infer({method: 'MCMC',
22       samples: 10000,
23       lag: 10}, model);
24var ANSWER = marginalize(posterior, "x");
25

◆realization

unavailable in pyro

query pins MCMC/MH (10000 samples, lag 10). Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard |crossSection|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost.

02answer overlay — webppl vs pyrodist/real

webppl pyro1632 bins · -0.84 … 0.84

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.3059 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.125, 0.256] · claude-sonnet-4-6
cross-language (pyro vs webppl)	unavailable	query pins MCMC/MH (10000 samples, lag 10). Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard \|crossSection\|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost.

★ feedback on this problem

probmods2-inference-algorithms / ex1.2

answer record(x, y) solver accept pyro unavailable

00 statement source: exercises/inference-algorithms.md

given

A point (x, y) lies on a heart-shaped curve if x² + (y − x^(2/3))² − 1 is within 0.01 of zero. The x-coordinate ranges over [−1, 1] and the y-coordinate over [−1, 1.6]. The proposal distribution draws x and y jointly from a two-dimensional Gaussian with mean at the center of the bounding box (xmu = 0, ymu = 0.3) and standard deviations equal to half the bounding-box width (xsigma = 1, ysigma = 1.3).

model

Draw x and y jointly from the two-dimensional Gaussian described in 'given', then condition on the point being on the curve.

query

The marginal posterior distributions of x and y separately, each as a real-valued distribution, obtained by running MH-MCMC for 1000 samples with a lag of 100.

answer spec record(x, y)

{
  "kind": "record",
  "fields": {
    "x": {
      "kind": "dist",
      "domain": "real"
    },
    "y": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var onCurve = function(x, y) {
2  var x2 = x*x;
3  var term1 = y - Math.pow(x2, 1/3);
4  var crossSection = x2 + term1*term1 - 1;
5  return Math.abs(crossSection) < 0.01;
6};
7var xbounds = [-1, 1];
8var ybounds = [-1, 1.6];
9
10var xmu = 0.5 * (xbounds[0] + xbounds[1]);
11var ymu = 0.5 * (ybounds[0] + ybounds[1]);
12var xsigma = 0.5 * (xbounds[1] - xbounds[0]);
13var ysigma = 0.5 * (ybounds[1] - ybounds[0]);
14
15var model = function() {
16  var xy = diagCovGaussian({mu: Vector([xmu, ymu]),
17                            sigma: Vector([xsigma, ysigma])});
18  var x = T.get(xy, 0);
19  var y = T.get(xy, 1);
20  condition(onCurve(x, y));
21  return {x: x, y: y};
22};
23var posterior = Infer({method: 'MCMC',
24       samples: 1000,
25       lag: 100}, model);
26var ANSWER = {
27  x: marginalize(posterior, function(p) { return p.x; }),
28  y: marginalize(posterior, function(p) { return p.y; })
29};

◆realization

unavailable in pyro

query pins MH-MCMC (1000 samples, lag 100, joint proposal). Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard |crossSection|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost.

02answer overlay — webppl vs pyrorecord(x, y)

x

webppl pyro24 bins · -0.96 … 0.96

y

webppl pyro24 bins · -0.92 … 1.46

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.1693 (record)
solver re-derivation	accept	2/2 solvers · d=[0.260, 0.260] · claude-sonnet-4-6
cross-language (pyro vs webppl)	unavailable	query pins MH-MCMC (1000 samples, lag 100, joint proposal). Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard \|crossSection\|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost.

★ feedback on this problem

probmods2-inference-algorithms / ex1.3

answer dist/real solver accept pyro unavailable

00 statement source: exercises/inference-algorithms.md

given

A point (x, y) lies on a heart-shaped curve if x² + (y − x^(2/3))² − 1 is within 0.01 of zero. The x-coordinate ranges over [−1, 1] and the y-coordinate over [−1, 1.6]. Each of x and y is drawn independently from a Gaussian with mean at the center of its bounding-box range and standard deviation equal to half the range (xmu = 0, ymu = 0.3, xsigma = 1, ysigma = 1.3).

model

Draw x and y independently from their respective Gaussians described in 'given', then condition on the point being on the curve.

query

The marginal posterior distribution over y as a real-valued distribution, obtained by running HMC-MCMC for 10000 samples using a leapfrog kernel with 10 steps and step size 0.5.

answer spec dist/real

{
  "kind": "dist",
  "domain": "real"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var onCurve = function(x, y) {
2  var x2 = x*x;
3  var term1 = y - Math.pow(x2, 1/3);
4  var crossSection = x2 + term1*term1 - 1;
5  return Math.abs(crossSection) < 0.01;
6};
7var xbounds = [-1, 1];
8var ybounds = [-1, 1.6];
9
10var xmu = 0.5 * (xbounds[0] + xbounds[1]);
11var ymu = 0.5 * (ybounds[0] + ybounds[1]);
12var xsigma = 0.5 * (xbounds[1] - xbounds[0]);
13var ysigma = 0.5 * (ybounds[1] - ybounds[0]);
14
15var model = function() {
16  var x = gaussian(xmu, xsigma);
17  var y = gaussian(ymu, ysigma);
18  condition(onCurve(x, y));
19  return {x: x, y: y};
20};
21var posterior = Infer({method: 'MCMC',
22       samples: 10000,
23       kernel: {HMC : { steps: 10, stepSize: .5 }} }, model);
24var ANSWER = marginalize(posterior, function(p) { return p.y; });

◆realization

unavailable in pyro

query pins HMC (leapfrog 10 steps, step 0.5) — HMC is impossible on this non-differentiable target. Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard |crossSection|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost.

02answer overlay — webppl vs pyrodist/real

webppl pyro57 bins · -0.69 … 1.51

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.2406 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.223, 0.223] · claude-opus-4-8
cross-language (pyro vs webppl)	unavailable	query pins HMC (leapfrog 10 steps, step 0.5) — HMC is impossible on this non-differentiable target. Method-pinned: the query fixes the inference method and settings, so the target is sampler-specific, not a determinate posterior (outside the determination criterion). It is also gradient-hostile in Pyro: x^(2/3) has a singular gradient at x=0 and the hard \|crossSection\|<0.01 band gives no valid gradient-based init, so NUTS/HMC cannot run; a faithful gradient-free RandomWalkKernel (WebPPL's MH) mixes pathologically slowly across the cusp between the two symmetric lobes and yields no stable GT at practical cost.

★ feedback on this problem

probmods2-inference-algorithms / ex2.4

answer dist/real solver accept pyro unavailable

00 statement source: exercises/inference-algorithms.md

given

point1 is fixed at −10. point2 is drawn uniformly from [−100, 100]. interpolationWeight is drawn uniformly from [0, 1]. The interpolated value pointInMiddle = point1 × interpolationWeight + point2 × (1 − interpolationWeight) must satisfy |pointInMiddle| < 0.01.

model

Draw point2 and interpolationWeight from their priors; compute pointInMiddle as the weighted interpolation of point1 and point2; condition hard on |pointInMiddle| < 0.01.

query

The marginal posterior distribution over interpolationWeight as a real-valued distribution, using rejection sampling with 1000 samples.

answer spec dist/real

{
  "kind": "dist",
  "domain": "real"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var interpolate = function(point1, point2, interpolationWeight) {
2  return (point1 * interpolationWeight +
3          point2 * (1 - interpolationWeight));
4};
5
6var model = function(){
7  var point1 = -10;
8  var point2 = uniform(-100, 100);
9  var interpolationWeight = uniform(0, 1);
10  var pointInMiddle = interpolate(point1, point2, interpolationWeight);
11  condition(Math.abs(pointInMiddle) < 0.01);
12  return {point2: point2, interpolationWeight: interpolationWeight, pointInMiddle: pointInMiddle};
13};
14var posterior = Infer({method: 'rejection', samples: 1000}, model);
15var ANSWER = marginalize(posterior, function(x) { return x.interpolationWeight; });

◆realization

unavailable in pyro

Method-pinned (query: rejection sampling, 1000 samples) on a thin acceptance band |pointInMiddle|<0.01 with prior acceptance ~2e-4. The posterior over interpolationWeight is determinate, but cannot be certified in Pyro at practical cost: plain Importance is accurate yet ill-posed (too few accepts) at feasible sample counts and times out at the counts needed to bring the noise floor under the discriminability cap; a guided proposal needs fragile hand-tuning against the bounded point2 prior. Same inference-algorithms method-demo family as ex1.1/1.2/1.3.

02answer overlay — webppl vs pyrodist/real

webppl pyro1000 bins · 0.00 … 0.91

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0221 (w1)
solver re-derivation	accept	2/2 solvers · d=[0.010, 0.010] · claude-sonnet-4-6
cross-language (pyro vs webppl)	unavailable	Method-pinned (query: rejection sampling, 1000 samples) on a thin acceptance band \|pointInMiddle\|<0.01 with prior acceptance ~2e-4. The posterior over interpolationWeight is determinate, but cannot be certified in Pyro at practical cost: plain Importance is accurate yet ill-posed (too few accepts) at feasible sample counts and times out at the counts needed to bring the noise floor under the discriminability cap; a guided proposal needs fragile hand-tuning against the bounded point2 prior. Same inference-algorithms method-demo family as ex1.1/1.2/1.3.

★ feedback on this problem

probmods2-learning-as-conditional-inference / ex1.1

answer value/realvec solver accept pyro pass 0.0167

00 statement source: exercises/learning-as-conditional-inference.md

given

A coin is fair (weight 0.5) with prior probability 0.9, and biased with prior probability 0.1. Among biased coins, the weight is 1 (two-faced) with probability 0.7 and drawn uniformly from (0, 1) with probability 0.3. Each toss follows a Bernoulli distribution with the coin's weight. The full dataset consists of 50 heads. The observed data sizes to evaluate at are [0, 1, 2, 4, 6, 8, 10, 12, 15, 20, 25, 30, 40, 50].

model

A coin is drawn from a two-component prior: with probability 0.9 it is fair; otherwise it is biased, and within the biased class it is two-faced (weight 1) with probability 0.7 or has a uniformly-drawn weight with probability 0.3. Each observed toss is independently generated from the Bernoulli distribution with the coin's weight.

query

For each prefix of the full 50-heads dataset of length N in [0, 1, 2, 4, 6, 8, 10, 12, 15, 20, 25, 30, 40, 50], compute the posterior expected coin weight given the first N observations, using MCMC with 1000 burn-in steps and 10000 samples. Return the array of 14 expected weights.

answer spec value/realvec

{
  "kind": "value",
  "domain": "realvec",
  "estimated": true
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var weightPosterior = function(observedData) {
2  return Infer({method: 'MCMC', burn:1000, samples: 10000}, function() {
3    var isFair = flip(0.9);
4    var isTwoFaced = flip(0.7);
5    var realWeight = isFair ? 0.5 : (isTwoFaced ? 1 : uniform({a:0, b:1}));
6    var coin = Bernoulli({p: realWeight});
7    var obsFn = function(datum) { observe(coin, datum=='h') };
8    mapData({data: observedData}, obsFn);
9    return realWeight;
10  })
11};
12
13var fullDataSet = repeat(50, function() { 'h' });
14var observedDataSizes = [0,1,2,4,6,8,10,12,15,20,25,30,40,50];
15var ANSWER = (map(function(N) { expectation(weightPosterior(fullDataSet.slice(0, N))) }, observedDataSizes));

◆realization0.017

python

1# Two-component coin prior: fair (0.5) w.p. 0.9; else biased -> two-faced (weight 1)
2# w.p. 0.7 or uniform(0,1) w.p. 0.3. Each toss Bernoulli(weight). Observe N heads for
3# each N in observedDataSizes; return posterior expected weight for each prefix.
4# Discrete latents (isFair, isTwoFaced) sampled; continuous uniform weight handled by
5# importance over the model. A single Importance path covers all N including 0: with
6# no observations Importance reduces to prior sampling, so the same normalized-weight
7# expectation gives the prior mean -- no separate hand-rolled loop.
8
9observedDataSizes = [0, 1, 2, 4, 6, 8, 10, 12, 15, 20, 25, 30, 40, 50]
10
11def make_model(num_heads, n_obs):
12    def model():
13        isFair = pyro.sample("isFair", dist.Bernoulli(0.9))
14        isTwoFaced = pyro.sample("isTwoFaced", dist.Bernoulli(0.7))
15        if bool(isFair.item()):
16            weight = torch.tensor(0.5)
17        elif bool(isTwoFaced.item()):
18            weight = torch.tensor(1.0)
19        else:
20            weight = pyro.sample("weight", dist.Uniform(0.0, 1.0))
21        if n_obs > 0:
22            # all observed tosses are heads
23            pyro.sample("obs", dist.Binomial(n_obs, weight), obs=torch.tensor(float(num_heads)))
24        return weight
25    return model
26
27expected = []
28for N in observedDataSizes:
29    mdl = make_model(N, N)  # full dataset is 50 heads, so a prefix of length N is N heads
30    posterior = pyro.infer.Importance(mdl, num_samples=5000).run()
31    lw = torch.tensor([posterior.log_weights[i] for i in range(len(posterior.log_weights))])
32    w = torch.softmax(lw, dim=0)
33    vals = torch.tensor([float(tr.nodes["_RETURN"]["value"].item())
34                         for tr in posterior.exec_traces])
35    expected.append(float((w * vals).sum().item()))
36
37ANSWER = expected
38

02answervalue/realvec

webppl

[0.5355, 0.5659, 0.6160, 0.7918, 0.9311, 0.9643, 0.9920, 0.9960, 0.9982, 0.9990, 0.9992, 0.9996, 0.9998, 0.9998]

pyro

[0.5360, 0.5696, 0.6280, 0.7891, 0.9190, 0.9712, 0.9896, 0.9956, 0.9982, 0.9989, 0.9994, 0.9996, 0.9999, 0.9997]

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0470 (absdiff)
solver re-derivation	accept	2/2 solvers · d=[0.024, 0.019] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0167 ≤ tol 0.0939 · floors 0.0137/0.0470

★ feedback on this problem

probmods2-learning-as-conditional-inference / ex2.1

answer record(prior, post) solver accept pyro pass 0.0034

00 statement source: exercises/learning-as-conditional-inference.md

given

A coin's weight is given a Beta(10, 10) prior. The full dataset alternates heads and tails 50 times each, giving a sequence of 100 observations (h, t, h, t, …).

model

The coin weight is drawn from the prior. Each toss is independently Bernoulli-distributed with the coin's weight as its success probability. The posterior is inferred via MCMC with 1000 burn-in steps and 1000 samples.

query

Return a record with two distributions: the prior distribution over coin weight, and the posterior distribution over coin weight after conditioning on all 100 observations. The prior field must be the parametric Beta distribution object itself (the prior as a distribution, not samples drawn from it).

answer spec record(prior, post)

{
  "kind": "record",
  "fields": {
    "prior": {
      "kind": "dist",
      "domain": "real"
    },
    "post": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var pseudoCounts = {a: 10, b: 10};
2
3var weightPosterior = function(observedData){
4  return Infer({method: 'MCMC', burn:1000, samples: 1000}, function() {
5    var coinWeight = sample(Beta(pseudoCounts));
6    var coinDist = Bernoulli({p: coinWeight});
7    var obsFn = function(datum){ observe(coinDist, datum=='h') };
8    mapData({data: observedData}, obsFn);
9    return coinWeight;
10  })
11};
12
13var fullDataSet = repeat(50, function() { ['h', 't'] }).flat();
14var ANSWER = (({
15  prior: Beta(pseudoCounts),
16  post: weightPosterior(fullDataSet)
17}));

◆realization0.003

python

1
2# Beta-Bernoulli: prior Beta(10,10), 100 observations alternating h,t.
3# Prior field is the parametric Beta distribution object itself.
4# Posterior over the coin weight inferred via MCMC (NUTS).
5
6observations = torch.tensor([1.0, 0.0] * 50)
7
8def model():
9    weight = pyro.sample("weight", dist.Beta(10.0, 10.0))
10    with pyro.plate("data", observations.shape[0]):
11        pyro.sample("obs", dist.Bernoulli(weight), obs=observations)
12    return weight
13
14kernel = pyro.infer.NUTS(model)
15mcmc = pyro.infer.MCMC(kernel, num_samples=1000, warmup_steps=1000,
16                       disable_progbar=True)
17mcmc.run()
18
19ANSWER = {
20    "prior": dist.Beta(10.0, 10.0),
21    "post": mcmc.get_samples()["weight"],
22}
23

02answer overlay — webppl vs pyrorecord(prior, post)

prior

"beta(a: 10, b: 10)"

post

webppl pyro24 bins · 0.37 … 0.64

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0069 (record)
solver re-derivation	accept	2/2 solvers · d=[0.005, 0.005] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0034 ≤ tol 0.0138 · floors 0.0066/0.0069

★ feedback on this problem

probmods2-learning-as-conditional-inference / ex2.2

answer value/realvec solver accept pyro pass 0.0011

00 statement source: exercises/learning-as-conditional-inference.md

given

A coin's weight is given a Beta(10, 10) prior. The full dataset alternates heads and tails 256 times each, giving a sequence of 512 observations (h, t, h, t, …). The data-size checkpoints to evaluate at are [0, 2, 4, 8, 16, 32, 64, 128, 256, 512].

model

The coin weight is drawn from the prior. Each toss is independently Bernoulli-distributed with the coin's weight as its success probability. At each checkpoint N, the posterior over coin weight is inferred from the first N observations via MCMC with 1000 burn-in steps and 1000 samples.

query

For each checkpoint N in [0, 2, 4, 8, 16, 32, 64, 128, 256, 512], compute the variance of the posterior over coin weight given the first N observations. The variance is the posterior expected squared deviation from the posterior mean. Return the array of 10 variances.

answer spec value/realvec

{
  "kind": "value",
  "domain": "realvec",
  "estimated": true
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var pseudoCounts = {a: 10, b: 10};
2
3var weightPosterior = function(observedData){
4  return Infer({method: 'MCMC', burn:1000, samples: 1000}, function() {
5    var coinWeight = sample(Beta(pseudoCounts));
6    var coinDist = Bernoulli({p: coinWeight});
7    var obsFn = function(datum){ observe(coinDist, datum=='h') };
8    mapData({data: observedData}, obsFn);
9    return coinWeight;
10  })
11};
12
13var fullDataSet = repeat(256, function(){['h', 't']}).flat();
14var observedDataSizes = [0,2,4,8,16,32,64,128,256,512];
15var ANSWER = (map(function(N) {
16  var posterior = weightPosterior(fullDataSet.slice(0,N));
17  var mean = expectation(posterior);
18  return expectation(posterior, function(x) { Math.pow(x - mean, 2) });
19}, observedDataSizes));

◆realization0.001

python

1# probmods2-learning-as-conditional-inference/ex2.2
2# Beta-Bernoulli coin: prior Beta(a=10, b=10); data is the alternating sequence
3# ['h','t'] repeated 256 times (512 observations). For each checkpoint N, condition
4# on the first N observations and report the posterior VARIANCE of the coin weight.
5# The posterior comes from running Pyro NUTS over the Beta-Bernoulli model (the
6# values are estimated, as the query and spec require) -- no conjugate formula.
7import pyro.infer
8
9pseudo_a = 10.0
10pseudo_b = 10.0
11
12# repeat(256, ['h','t']).flat()
13full_data = ['h', 't'] * 256
14observed_sizes = [0, 2, 4, 8, 16, 32, 64, 128, 256, 512]
15
16def make_model(obs_list):
17    if len(obs_list) > 0:
18        obs_tensor = torch.tensor([1.0 if d == 'h' else 0.0 for d in obs_list])
19    else:
20        obs_tensor = None
21
22    def model():
23        coin_weight = pyro.sample("coin_weight",
24                                  dist.Beta(torch.tensor(pseudo_a), torch.tensor(pseudo_b)))
25        if obs_tensor is not None:
26            with pyro.plate("data", obs_tensor.shape[0]):
27                pyro.sample("obs", dist.Bernoulli(coin_weight), obs=obs_tensor)
28        return coin_weight
29
30    return model
31
32variances = []
33for N in observed_sizes:
34    model = make_model(full_data[:N])
35    kernel = pyro.infer.NUTS(model)
36    mcmc = pyro.infer.MCMC(kernel, num_samples=1000, warmup_steps=500,
37                           disable_progbar=True)
38    mcmc.run()
39    samples = mcmc.get_samples()["coin_weight"]
40    mean = samples.mean()
41    var = ((samples - mean) ** 2).mean()
42    variances.append(var.item())
43
44ANSWER = variances
45

02answervalue/realvec

webppl

[0.0119, 0.0107, 0.0101, 0.0088, 0.0066, 0.0045, 0.0029, 0.0018, 0.0009, 0.0005]

pyro

[0.0122, 0.0115, 0.0097, 0.0087, 0.0070, 0.0047, 0.0028, 0.0015, 0.0009, 0.0005]

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0019 (absdiff)
solver re-derivation	accept	2/2 solvers · d=[0.001, 0.001] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0011 ≤ tol 0.0039 · floors 0.0018/0.0019

★ feedback on this problem

probmods2-mixture-models / ex1.a

answer record(low_antennae, low_green, low_blargh, high_antennae, high_green, high_blargh) solver accept pyro pass 0.0374

00 statement source: exercises/mixture-models.md

given

Ten aliens are observed, each with three binary properties: antennae, green, blarghNoise. The observations are: alien 1: antennae=false, green=false, blarghNoise=false alien 2: antennae=true, green=true, blarghNoise=true alien 3: antennae=true, green=true, blarghNoise=true alien 4: antennae=true, green=true, blarghNoise=true alien 5: antennae=false, green=false, blarghNoise=false alien 6: antennae=true, green=true, blarghNoise=true alien 7: antennae=false, green=false, blarghNoise=false alien 8: antennae=true, green=true, blarghNoise=true alien 9: antennae=false, green=false, blarghNoise=false alien 10: antennae=false, green=false, blarghNoise=false There are two latent alien kinds. For each kind, the probability of each of the three binary properties is drawn independently from a Beta(0.5, 0.5) prior. Each alien independently belongs to either kind with equal prior probability (0.5 each), and its three properties are each independently drawn from the Bernoulli distribution with the kind's corresponding property probability. The group prototypes are shared across aliens of the same kind within one inference run. Inference uses MCMC with an HMC kernel (10 leapfrog steps, step size 0.01) for 3000 samples.

model

A two-component mixture model over alien kinds. Each kind has a prototype: three independent property probabilities drawn from Beta(0.5, 0.5). Each alien's kind is drawn 50/50, and its three binary properties are drawn independently from Bernoulli distributions parameterized by the kind's prototype. The prototype is shared (memoized) within one inference run.

query

Compute the posterior mean of each group's property probabilities, sorted so that the group with the lower posterior mean antennae probability is the 'low' group and the other is the 'high' group. Return a record with six fields: low_antennae, low_green, low_blargh, high_antennae, high_green, high_blargh — each the posterior expected probability for that group and property.

answer spec record(low_antennae, low_green, low_blargh, high_antennae, high_green, high_blargh)

{
  "kind": "record",
  "fields": {
    "low_antennae": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "low_green": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "low_blargh": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "high_antennae": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "high_green": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "high_blargh": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var properties = ['antennae', 'green', 'blarghNoise'];
2var data = [
3  {antennae : false, green: false, blarghNoise: false},
4  {antennae : true,  green: true,  blarghNoise: true},
5  {antennae : true,  green: true,  blarghNoise: true},
6  {antennae : true,  green: true,  blarghNoise: true},
7  {antennae : false, green: false, blarghNoise: false},
8  {antennae : true,  green: true,  blarghNoise: true},
9  {antennae : false, green: false, blarghNoise: false},
10  {antennae : true,  green: true,  blarghNoise: true},
11  {antennae : false, green: false, blarghNoise: false},
12  {antennae : false, green: false, blarghNoise: false}
13];
14
15var sampleGroupPrototype = mem(function(groupName) {
16  var probs = repeat(3, function(){ beta(.5, .5)});
17  return _.zipObject(properties, probs);
18});
19var posterior = Infer({method: 'MCMC', kernel: {HMC: {steps: 10, stepSize: .01}}, samples: 3000},
20      function(){
21  mapData({data: data}, function(datum) {
22    var group = flip() ? 'group1' : 'group2';
23    var prototype = sampleGroupPrototype(group);
24    mapData({data: properties}, function(property) {
25      observe(Bernoulli({p: prototype[property]}), datum[property]);
26    });
27  });
28  return {group1: sampleGroupPrototype('group1'),
29          group2: sampleGroupPrototype('group2')};
30});
31var g1Mean = expectation(posterior, function(s) { return s.group1.antennae });
32var g2Mean = expectation(posterior, function(s) { return s.group2.antennae });
33var lowGroup  = g1Mean < g2Mean ? 'group1' : 'group2';
34var highGroup = g1Mean < g2Mean ? 'group2' : 'group1';
35var ANSWER = ({
36  low_antennae:  expectation(posterior, function(s) { return s[lowGroup].antennae }),
37  low_green:     expectation(posterior, function(s) { return s[lowGroup].green }),
38  low_blargh:    expectation(posterior, function(s) { return s[lowGroup].blarghNoise }),
39  high_antennae: expectation(posterior, function(s) { return s[highGroup].antennae }),
40  high_green:    expectation(posterior, function(s) { return s[highGroup].green }),
41  high_blargh:   expectation(posterior, function(s) { return s[highGroup].blarghNoise })
42});

◆realization0.037

python

1# Two-component mixture over 10 aliens, 3 binary properties each.
2# Each group prototype = 3 independent Beta(.5,.5) property probs (continuous).
3# Each alien's group ~ Bernoulli(.5) (discrete); properties ~ Bernoulli(prototype).
4# Continuous prototypes sampled by NUTS; per-alien group assignments marginalized
5# by enumeration. Group labels are non-identifiable, so each posterior sample's two
6# prototypes are sorted by antennae prob before averaging (this matches the
7# reference's sort-by-antennae when the chain does not switch labels, and is robust
8# if it does).
9data = torch.tensor([
10    [0., 0., 0.],
11    [1., 1., 1.],
12    [1., 1., 1.],
13    [1., 1., 1.],
14    [0., 0., 0.],
15    [1., 1., 1.],
16    [0., 0., 0.],
17    [1., 1., 1.],
18    [0., 0., 0.],
19    [0., 0., 0.],
20], dtype=torch.float64)
21
22@pyro.infer.config_enumerate
23def model():
24    p1 = pyro.sample('p1', dist.Beta(0.5, 0.5).expand([3]).to_event(1))
25    p2 = pyro.sample('p2', dist.Beta(0.5, 0.5).expand([3]).to_event(1))
26    protos = torch.stack([p1, p2], dim=0)  # [2, 3]
27    with pyro.plate('aliens', 10):
28        g = pyro.sample('g', dist.Bernoulli(0.5))  # enumerated: 0->group1, 1->group2
29        idx = g.long()
30        proto = protos[idx]  # broadcasts over enum dim -> [..., 3]
31        pyro.sample('obs', dist.Bernoulli(proto).to_event(1), obs=data)
32
33kernel = pyro.infer.NUTS(model)
34mcmc = pyro.infer.MCMC(kernel, num_samples=900, warmup_steps=500)
35mcmc.run()
36samples = mcmc.get_samples()
37p1_s = samples['p1'].to(torch.float64)  # [N, 3]
38p2_s = samples['p2'].to(torch.float64)  # [N, 3]
39
40# Per-sample sort: low group = the one with smaller antennae (index 0) probability.
41p1_is_low = (p1_s[:, 0] <= p2_s[:, 0]).unsqueeze(-1)  # [N,1]
42low = torch.where(p1_is_low, p1_s, p2_s)   # [N,3]
43high = torch.where(p1_is_low, p2_s, p1_s)  # [N,3]
44low_mean = low.mean(dim=0)
45high_mean = high.mean(dim=0)
46
47ANSWER = {
48    'low_antennae': low_mean[0].item(),
49    'low_green': low_mean[1].item(),
50    'low_blargh': low_mean[2].item(),
51    'high_antennae': high_mean[0].item(),
52    'high_green': high_mean[1].item(),
53    'high_blargh': high_mean[2].item(),
54}
55

02answer overlay — webppl vs pyrorecord(low_antennae, low_green, low_blargh, high_antennae, high_green, high_blargh)

low_antennae

0.0824

low_green

0.0775

low_blargh

0.0671

high_antennae

0.8851

high_green

0.9076

high_blargh

0.9182

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0676 (record)
solver re-derivation	accept	2/2 solvers · d=[0.029, 0.042] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0374 ≤ tol 0.1352 · floors 0.0088/0.0676

★ feedback on this problem

probmods2-mixture-models / ex1.b

answer record(low_antennae, low_green, low_blargh, high_antennae, high_green, high_blargh, p_mystery_from_high) solver accept pyro pass 0.0272

00 statement source: exercises/mixture-models.md

given

Ten aliens are observed, each with three binary properties: antennae, green, blarghNoise. The observations are: alien 1: antennae=false, green=false, blarghNoise=false; alien 2: antennae=true, green=true, blarghNoise=true; alien 3: antennae=true, green=true, blarghNoise=true; alien 4: antennae=true, green=true, blarghNoise=true; alien 5: antennae=false, green=false, blarghNoise=false; alien 6: antennae=true, green=true, blarghNoise=true; alien 7: antennae=false, green=false, blarghNoise=false; alien 8: antennae=true, green=true, blarghNoise=true; alien 9: antennae=false, green=false, blarghNoise=false; alien 10: antennae=false, green=false, blarghNoise=false. There are two latent alien kinds. For each kind, the probability of each of the three binary properties is drawn independently from a Beta(0.5, 0.5) prior. Each alien independently belongs to either kind with equal prior probability (0.5 each), and its three properties are each independently drawn from the Bernoulli distribution with the kind's corresponding property probability. The group prototypes are shared (memoized) across aliens of the same kind within one inference run. Inference uses MCMC with an HMC kernel (10 leapfrog steps, step size 0.01) for 6000 samples. Additionally, a blargh sound is heard from a crater but the alien cannot be seen. This mystery alien belongs to either kind with equal prior probability.

model

Extend the ex1.a mixture model with one additional latent alien: the mystery alien's kind is drawn with equal probability from the two kinds, and its blarghNoise property is observed to be true. The prototypes are shared across all aliens, including the mystery alien.

query

Compute the posterior mean of each group's property probabilities, sorted so that the group with the lower posterior mean antennae probability is the 'low' group and the other is the 'high' group. Also compute the posterior probability that the mystery alien belongs to the high group. Return a record with seven fields: low_antennae, low_green, low_blargh, high_antennae, high_green, high_blargh, p_mystery_from_high. Estimate with MCMC using an HMC kernel (10 leapfrog steps, step size 0.01) and 6000 posterior samples.

answer spec record(low_antennae, low_green, low_blargh, high_antennae, high_green, high_blargh, p_mystery_from_high)

{
  "kind": "record",
  "fields": {
    "low_antennae": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "low_green": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "low_blargh": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "high_antennae": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "high_green": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "high_blargh": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "p_mystery_from_high": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var properties = ['antennae', 'green', 'blarghNoise'];
2var data = [
3  {antennae : false, green: false, blarghNoise: false},
4  {antennae : true,  green: true,  blarghNoise: true},
5  {antennae : true,  green: true,  blarghNoise: true},
6  {antennae : true,  green: true,  blarghNoise: true},
7  {antennae : false, green: false, blarghNoise: false},
8  {antennae : true,  green: true,  blarghNoise: true},
9  {antennae : false, green: false, blarghNoise: false},
10  {antennae : true,  green: true,  blarghNoise: true},
11  {antennae : false, green: false, blarghNoise: false},
12  {antennae : false, green: false, blarghNoise: false}
13];
14var sampleGroupPrototype = mem(function(groupName) {
15  var probs = repeat(3, function(){ beta(.5, .5)});
16  return _.zipObject(properties, probs);
17});
18var posterior = Infer({method: 'MCMC', kernel: {HMC: {steps: 10, stepSize: .01}}, samples: 6000},
19      function(){
20  mapData({data: data}, function(datum) {
21    var group = flip() ? 'group1' : 'group2';
22    var prototype = sampleGroupPrototype(group);
23    mapData({data: properties}, function(property) {
24      observe(Bernoulli({p: prototype[property]}), datum[property]);
25    });
26  });
27  var mysteryGroup = flip() ? 'group1' : 'group2';
28  var mysteryPrototype = sampleGroupPrototype(mysteryGroup);
29  observe(Bernoulli({p: mysteryPrototype['blarghNoise']}), true);
30  return {group1: sampleGroupPrototype('group1'),
31          group2: sampleGroupPrototype('group2'),
32          mysteryGroup: mysteryGroup};
33});
34var g1Mean = expectation(posterior, function(s) { return s.group1.antennae });
35var g2Mean = expectation(posterior, function(s) { return s.group2.antennae });
36var highGroup = g1Mean >= g2Mean ? 'group1' : 'group2';
37var lowGroup  = g1Mean >= g2Mean ? 'group2' : 'group1';
38var ANSWER = ({
39  low_antennae:  expectation(posterior, function(s) { return s[lowGroup].antennae }),
40  low_green:     expectation(posterior, function(s) { return s[lowGroup].green }),
41  low_blargh:    expectation(posterior, function(s) { return s[lowGroup].blarghNoise }),
42  high_antennae: expectation(posterior, function(s) { return s[highGroup].antennae }),
43  high_green:    expectation(posterior, function(s) { return s[highGroup].green }),
44  high_blargh:   expectation(posterior, function(s) { return s[highGroup].blarghNoise }),
45  p_mystery_from_high: expectation(posterior, function(s) { return s.mysteryGroup === highGroup ? 1 : 0 })
46});

◆realization0.027

python

1# Two-kind alien mixture. Continuous prototype probabilities (Beta(.5,.5) per
2# property per kind) are the only true continuous latents; the per-alien kind
3# assignments and the mystery alien's kind are discrete and are marginalized out
4# with config_enumerate so NUTS samples only the 6 continuous prototype params.
5properties = ["antennae", "green", "blarghNoise"]
6data = [
7    {"antennae": False, "green": False, "blarghNoise": False},
8    {"antennae": True,  "green": True,  "blarghNoise": True},
9    {"antennae": True,  "green": True,  "blarghNoise": True},
10    {"antennae": True,  "green": True,  "blarghNoise": True},
11    {"antennae": False, "green": False, "blarghNoise": False},
12    {"antennae": True,  "green": True,  "blarghNoise": True},
13    {"antennae": False, "green": False, "blarghNoise": False},
14    {"antennae": True,  "green": True,  "blarghNoise": True},
15    {"antennae": False, "green": False, "blarghNoise": False},
16    {"antennae": False, "green": False, "blarghNoise": False},
17]
18
19# data tensor: 10 aliens x 3 properties (1.0/0.0)
20data_t = torch.tensor([[1.0 if d[p] else 0.0 for p in properties] for d in data])
21n_aliens = len(data)
22
23
24@pyro.infer.config_enumerate
25def model():
26    # group prototypes: for each group (2) and property (3), a Beta(.5,.5) prob.
27    # shape: (2 groups, 3 properties)
28    proto = pyro.sample(
29        "proto",
30        dist.Beta(0.5 * torch.ones(2, 3), 0.5 * torch.ones(2, 3)).to_event(2),
31    )
32    # each observed alien: pick a group (uniform), observe its 3 properties.
33    with pyro.plate("aliens", n_aliens):
34        group = pyro.sample("group", dist.Categorical(torch.ones(2) / 2))
35        # proto[group]: gather per-alien property probs -> shape (..., n_aliens, 3)
36        p = proto[group]  # advanced indexing over enumerated group dim
37        pyro.sample("obs", dist.Bernoulli(p).to_event(1), obs=data_t)
38    # mystery alien: pick a group (uniform), observe blarghNoise == True.
39    mystery = pyro.sample("mystery", dist.Categorical(torch.ones(2) / 2))
40    pm = proto[mystery][..., 2]  # blarghNoise is index 2
41    pyro.sample("mystery_obs", dist.Bernoulli(pm), obs=torch.tensor(1.0))
42
43
44nuts = pyro.infer.NUTS(model)
45mcmc = pyro.infer.MCMC(nuts, num_samples=1500, warmup_steps=600)
46mcmc.run()
47proto_samples = mcmc.get_samples()["proto"]  # (S, 2, 3)
48S = proto_samples.shape[0]
49
50# Posterior mean of each group's antennae prob; lower-mean group is 'low'.
51g0_ant = proto_samples[:, 0, 0].mean()
52g1_ant = proto_samples[:, 1, 0].mean()
53if g0_ant <= g1_ant:
54    low, high = 0, 1
55else:
56    low, high = 1, 0
57
58low_means = proto_samples[:, low, :].mean(dim=0)
59high_means = proto_samples[:, high, :].mean(dim=0)
60
61# Posterior probability the mystery alien is from the high group. The discrete
62# `mystery` site was enumerated out during NUTS, so recover its posterior by
63# running Pyro's exact discrete inference on a model that fixes the prototype
64# probabilities to each NUTS draw (plated over the S draws) and observes
65# blarghNoise == True for the mystery alien. compute_marginals returns the exact
66# per-draw marginal P(mystery | proto, obs); averaging P(mystery = high) over the
67# proto posterior gives the queried probability.
68blargh = proto_samples[:, :, 2].clamp(1e-9, 1 - 1e-9)  # (S, 2): per-draw blargh prob per group
69
70
71@pyro.infer.config_enumerate
72def mystery_model(blargh):
73    with pyro.plate("samples", S):
74        mystery = pyro.sample("mystery", dist.Categorical(torch.ones(2) / 2))
75        # select the chosen group's blargh prob per draw (tensor op, enum-safe)
76        pm = torch.where(mystery == 0, blargh[:, 0], blargh[:, 1])
77        pyro.sample("mystery_obs", dist.Bernoulli(pm), obs=torch.ones(S))
78
79
80elbo = pyro.infer.TraceEnum_ELBO(max_plate_nesting=1)
81marg = elbo.compute_marginals(mystery_model, lambda blargh: None, blargh)
82mystery_marg = marg["mystery"]
83sup = mystery_marg.enumerate_support()          # (2, S)
84probs = mystery_marg.log_prob(sup).exp()         # (2, S): probs[k, s] = P(mystery=k | proto_s)
85p_mystery_from_high = probs[high].mean()
86
87ANSWER = {
88    "low_antennae": float(low_means[0]),
89    "low_green": float(low_means[1]),
90    "low_blargh": float(low_means[2]),
91    "high_antennae": float(high_means[0]),
92    "high_green": float(high_means[1]),
93    "high_blargh": float(high_means[2]),
94    "p_mystery_from_high": float(p_mystery_from_high),
95}
96

02answer overlay — webppl vs pyrorecord(low_antennae, low_green, low_blargh, high_antennae, high_green, high_blargh, p_mystery_from_high)

low_antennae

0.1101

low_green

0.0888

low_blargh

0.1125

high_antennae

0.9110

high_green

0.9167

high_blargh

0.9261

p_mystery_from_high

0.9033

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0482 (record)
solver re-derivation	accept	1/2 solvers · d=[0.022, 0.057] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0272 ≤ tol 0.0963 · floors 0.0094/0.0482

★ feedback on this problem

probmods2-mixture-models / ex2.a

answer record(group_1_p, group_2_p) solver accept pyro pass 0.0023

00 statement source: exercises/mixture-models.md

given

Twenty-two participants each complete a memory test scored 0 to 45. Their scores are: [45, 45, 44, 45, 44, 45, 45, 45, 45, 45, 30, 20, 6, 44, 44, 27, 25, 17, 14, 27, 35, 30]. There are two latent groups: bona fide participants and malingerers. The bona-fide success probability is drawn uniformly from (0.5, 1). The malingerer success probability is drawn uniformly from (0, p_bona_fide), ensuring it is strictly lower. Each participant independently belongs to either group with equal prior probability (0.5 each). Each participant's score is drawn from a Binomial distribution with 45 trials and the group's success probability.

model

A two-group mixture model. Bona-fide and malingerer groups each have a latent success probability drawn from the priors above. Each participant's group is drawn with equal probability, and their score is drawn from a Binomial(45, p_group). Group success probabilities are shared across all participants of the same group within one inference run. Inference uses MCMC with 10000 samples.

query

Compute the marginal posterior distributions of the two group success probabilities. Return a record with two fields: group_1_p (the bona-fide group success probability posterior) and group_2_p (the malingerer group success probability posterior), each as a distribution over real values.

answer spec record(group_1_p, group_2_p)

{
  "kind": "record",
  "fields": {
    "group_1_p": {
      "kind": "dist",
      "domain": "real"
    },
    "group_2_p": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var scores = [45, 45, 44, 45, 44, 45, 45, 45, 45, 45, 30, 20, 6, 44, 44, 27, 25, 17, 14, 27, 35, 30];
2var subjIDs = _.range(scores.length);
3var data = map(function(datum) {return _.zipObject(['subjID', 'score'], datum)}, _.zip(subjIDs, scores));
4var posterior = // NOTE: 10k unburned samples bias g1_p low (0.985 vs converged 0.991) —
5// found by the Pyro cross-language gate. Keep the burn-in.
6Infer({method: 'MCMC', samples: 50000, burn: 10000}, function() {
7  var group_1_p = uniform(0.5, 1);
8  var group_2_p = uniform(0, group_1_p);
9  var participant2Group = mem(function(participantID) {
10    return flip() ? 'group1' : 'group2';
11  });
12  var group2Prob = mem(function(group) {
13    return group == 'group1' ? group_1_p : group_2_p;
14  });
15
16  var obsFn = function(datum){
17    var p = group2Prob(participant2Group(datum.subjID));
18    observe(Binomial({p: p, n: 45}), datum.score);
19  };
20  mapData({data: data}, obsFn);
21
22  var participantResults_ = map(function(datum) {return participant2Group(datum.subjID)}, data);
23  var participantResults = _.zipObject(_.range(participantResults_.length), participantResults_);
24  return _.merge(participantResults, {group_1_p: group_1_p, group_2_p: group_2_p});
25});
26var ANSWER = ({
27  group_1_p: marginalize(posterior, function(s) { return s.group_1_p }),
28  group_2_p: marginalize(posterior, function(s) { return s.group_2_p })
29});

◆realization0.002

python

1scores = [45, 45, 44, 45, 44, 45, 45, 45, 45, 45, 30, 20, 6, 44, 44, 27, 25, 17, 14, 27, 35, 30]
2scores_t = torch.tensor(scores, dtype=torch.float64)
3
4
5@pyro.infer.config_enumerate
6def model():
7    group_1_p = pyro.sample('group_1_p', dist.Uniform(0.5, 1.0))
8    # group_2_p ~ Uniform(0, group_1_p): reparametrize as group_1_p * u, u~U(0,1).
9    # Scaling U(0,1) by group_1_p yields exactly U(0, group_1_p); NUTS-friendly
10    # (no latent-dependent support).
11    u = pyro.sample('u', dist.Uniform(0.0, 1.0))
12    group_2_p = pyro.deterministic('group_2_p', group_1_p * u)
13    ps = torch.stack([group_1_p, group_2_p])
14    with pyro.plate('participants', len(scores)):
15        g = pyro.sample('g', dist.Bernoulli(0.5)).long()  # 0 -> group1, 1 -> group2
16        p = ps[g]
17        pyro.sample('score', dist.Binomial(total_count=45, probs=p), obs=scores_t)
18
19
20mcmc = pyro.infer.MCMC(pyro.infer.NUTS(model), num_samples=1000, warmup_steps=500)
21mcmc.run()
22samples = mcmc.get_samples()
23g1 = samples['group_1_p'].reshape(-1)
24ANSWER = {
25    'group_1_p': g1,
26    'group_2_p': (g1 * samples['u'].reshape(-1)),
27}
28

02answer overlay — webppl vs pyrorecord(group_1_p, group_2_p)

group_1_p

webppl pyro24 bins · 0.98 … 1.00

group_2_p

webppl pyro24 bins · 0.44 … 0.60

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0044 (record)
solver re-derivation	accept	2/2 solvers · d=[0.006, 0.006] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0023 ≤ tol 0.0088 · floors 0.0023/0.0044

★ feedback on this problem

probmods2-observing-sequences / ex1.a

answer dist/finite solver accept pyro pass 0.0433

00 statement source: exercises/05-observing-sequences.md

given

Vocabulary: {'dogs', 'cats', 'chase', 'sleep', 'stop'}. Each word (including a special start token) has its own transition distribution. Each transition distribution is drawn from a symmetric Dirichlet with parameter alpha = 1 (a uniform prior over distributions on the 5 vocabulary words). An observed sentence is ['dogs', 'chase', 'cats'] (without a trailing 'stop'; the sentence generator terminates upon drawing 'stop' and does not include 'stop' in the output).

model

Words are generated sequentially. A transition distribution is sampled independently for each source word, shared across all sentences (the same source word always draws from the same memoized distribution). Starting from a special 'start' token, each successive word is drawn from the transition distribution of the current word; the sentence ends when 'stop' is drawn (and 'stop' is not included in the output).

query

Using MCMC with burn-in 10000 and 50000 posterior samples (onlyMAP: false), condition on the observed sentence, then return the posterior distribution over the word that follows 'chase', with a Dirichlet drift kernel (concentration 10) on each transition distribution.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "dogs",
    "cats",
    "chase",
    "sleep",
    "stop"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var comparray = function(arr1,arr2){
2  return (JSON.stringify(arr1) === JSON.stringify(arr2));
3};
4var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 50000, onlyMAP:false}, function() {
5  let vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop'];
6  var wordToDistribution = mem(function(word) {
7    return dirichletDrift({alpha:ones([vocab.length,1]), concentration:10});
8  });
9  var transition = function(word) {
10    return categorical({ps: wordToDistribution(word), vs: vocab});
11  };
12  let obs = ['dogs', 'chase', 'cats'];
13  let generateSentence = function(lastState, sentence) {
14    let word = transition(lastState);
15    if (word == 'stop') return [];
16    return [word].concat(generateSentence(word, sentence));
17  };
18  condition(comparray(obs, generateSentence('start')));
19  return transition('chase');
20}));

◆realization0.043

python

1# Word-level bigram model. Each source word has a memoized transition distribution
2# ~ Dirichlet(ones(5)) over the vocabulary (continuous latents). Sentence draws are
3# categorical. Conditioning on the exact sentence ['dogs','chase','cats'] forces the
4# transition chain start->dogs->chase->cats->stop; we observe those categorical
5# draws. The query transition('chase') is a fresh draw from chase's (reweighted)
6# distribution. Discrete conditioning -> Importance sampling.
7vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop']
8idx = {w: i for i, w in enumerate(vocab)}
9V = len(vocab)
10
11def model():
12    # Per-source-word transition distributions (only the ones we touch).
13    d_start = pyro.sample('d_start', dist.Dirichlet(torch.ones(V, dtype=torch.float64)))
14    d_dogs = pyro.sample('d_dogs', dist.Dirichlet(torch.ones(V, dtype=torch.float64)))
15    d_chase = pyro.sample('d_chase', dist.Dirichlet(torch.ones(V, dtype=torch.float64)))
16    d_cats = pyro.sample('d_cats', dist.Dirichlet(torch.ones(V, dtype=torch.float64)))
17    # Observe the forced transition chain for the conditioned sentence.
18    pyro.sample('t0', dist.Categorical(d_start), obs=torch.tensor(idx['dogs']))
19    pyro.sample('t1', dist.Categorical(d_dogs), obs=torch.tensor(idx['chase']))
20    pyro.sample('t2', dist.Categorical(d_chase), obs=torch.tensor(idx['cats']))
21    pyro.sample('t3', dist.Categorical(d_cats), obs=torch.tensor(idx['stop']))
22    # Query: a fresh draw from chase's transition distribution.
23    q = pyro.sample('q', dist.Categorical(d_chase))
24    return vocab[int(q.item())]
25
26post = pyro.infer.Importance(model, num_samples=8000).run()
27lw = torch.tensor(post.log_weights, dtype=torch.float64)
28w = (lw - lw.max()).exp()
29w = w / w.sum()
30probs = {word: 0.0 for word in vocab}
31for tr, wi in zip(post.exec_traces, w):
32    probs[tr.nodes['_RETURN']['value']] += wi.item()
33
34ANSWER = probs
35

02answer overlay — webppl vs pyrodist/finite

webppl pyro5 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0461 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.026, 0.026] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0433 ≤ tol 0.1205 · floors 0.0602/0.0461

★ feedback on this problem

probmods2-observing-sequences / ex1.b

answer dist/finite solver accept pyro pass 0.0451

00 statement source: exercises/05-observing-sequences.md

given

Vocabulary: {'dogs', 'cats', 'chase', 'sleep', 'stop'}. Each word (including a special start token) has its own transition distribution drawn from a symmetric Dirichlet with concentration 10 over the 5 vocabulary words (alpha = ones([5,1]), concentration = 10). These per-word transition distributions are shared across all sentences (memoized). The first observed sentence is ['dogs', 'chase', 'cats', 'stop']. The sentence generator terminates upon drawing 'stop' and includes 'stop' in the output.

model

Words are generated sequentially. Starting from a special 'start' token, each successive word is drawn from the memoized transition distribution of the current word; the sentence ends when 'stop' is drawn (included in output). A second independent sentence is generated from the same shared transition distributions.

query

Using MCMC with burn-in 10000 and 50000 posterior samples (onlyMAP: false), condition on the first observed sentence, and also condition on the first word of the second sentence being 'dogs'. Return the posterior distribution over the second word of the second sentence.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "dogs",
    "cats",
    "chase",
    "sleep",
    "stop"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var comparray = function(arr1,arr2){
2  return (JSON.stringify(arr1) === JSON.stringify(arr2));
3};
4var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 50000, onlyMAP: false}, function() {
5  let vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop'];
6  var wordToDistribution = mem(function(word) {
7    return dirichletDrift({alpha:ones([vocab.length,1]), concentration:10});
8  });
9  var transition = function(word) {
10    return categorical({ps: wordToDistribution(word), vs: vocab});
11  };
12  let generateSentence = function(lastState, sentence) {
13    let word = transition(lastState);
14    if (word == 'stop') return ['stop'];
15    return [word].concat(generateSentence(word, sentence));
16  };
17  let obs = ['dogs', 'chase', 'cats', 'stop'];
18  condition(comparray(obs, generateSentence('start')));
19  let newSentence = generateSentence('start');
20  condition(newSentence[0] == 'dogs');
21  return newSentence[1];
22}));

◆realization0.045

python

1# probmods2-observing-sequences/ex1.b
2# Markov sentence model: each word's transition distribution is a memoized
3# Dirichlet(ones(5)) latent; transitions are Categorical over the vocab. Condition
4# on the first sentence being ['dogs','chase','cats','stop'] (observed transitions),
5# then on the first word of a second sentence being 'dogs'; return the posterior
6# over the second word of that second sentence. Inference is run with Pyro's
7# Importance sampler (continuous Dirichlet latents + observed/conditioned
8# Categorical transitions) -- no closed-form conjugacy.
9import pyro.infer
10from collections import defaultdict
11
12vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop']
13idx = {w: i for i, w in enumerate(vocab)}
14states = ['start'] + vocab
15alpha = torch.ones(len(vocab))  # Dirichlet prior ones(5); concentration:10 is the
16                                # drift-kernel width in WebPPL, NOT the prior alpha.
17
18obs_sentence = ['dogs', 'chase', 'cats', 'stop']
19
20def model():
21    # memoized transition distribution per state (sampled once, reused)
22    theta = {s: pyro.sample(f"theta_{s}", dist.Dirichlet(alpha)) for s in states}
23
24    # condition on first sentence: start->dogs->chase->cats->stop (observed)
25    prev = 'start'
26    for t, word in enumerate(obs_sentence):
27        pyro.sample(f"obs_{t}", dist.Categorical(theta[prev]),
28                    obs=torch.tensor(idx[word]))
29        prev = word
30
31    # second sentence: condition first word == 'dogs' (observed), query second word
32    pyro.sample("w1", dist.Categorical(theta['start']), obs=torch.tensor(idx['dogs']))
33    w2 = pyro.sample("w2", dist.Categorical(theta['dogs']))
34    return w2
35
36posterior = pyro.infer.Importance(model, num_samples=50000)
37posterior.run()
38
39weight_by_label = defaultdict(float)
40total = 0.0
41for tr, lw in zip(posterior.exec_traces, posterior.log_weights):
42    w = float(torch.as_tensor(lw).exp())
43    label = vocab[int(tr.nodes["_RETURN"]["value"])]
44    weight_by_label[label] += w
45    total += w
46
47ANSWER = {lab: weight_by_label.get(lab, 0.0) / total for lab in vocab}
48

02answer overlay — webppl vs pyrodist/finite

webppl pyro5 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0997 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.153, 0.133] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0451 ≤ tol 0.1994 · floors 0.0282/0.0997

★ feedback on this problem

probmods2-observing-sequences / ex1.c

answer dist/finite solver accept pyro pass 0.0641

00 statement source: exercises/05-observing-sequences.md

given

Vocabulary: {'dogs', 'cats', 'chase', 'sleep', 'stop'}. Each word (including a special start token) has its own transition distribution drawn from a symmetric Dirichlet with parameter alpha = 1 (a uniform prior over distributions on the 5 vocabulary words). These per-word transition distributions are shared across all sentences (memoized). The first observed sentence is ['dogs', 'chase', 'cats', 'stop']. The sentence generator terminates upon drawing 'stop' and includes 'stop' in the output.

model

Words are generated sequentially. Starting from a special 'start' token, each successive word is drawn from the memoized transition distribution of the current word; the sentence ends when 'stop' is drawn (included in output). A second independent sentence is generated from the same shared transition distributions.

query

Using MCMC with burn-in 10000 and 50000 posterior samples (onlyMAP: false), condition on the first observed sentence, and also condition on the second word of the second sentence being 'chase'. Return the posterior distribution over the first word of the second sentence, with a Dirichlet drift kernel (concentration 10) on each transition distribution.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "dogs",
    "cats",
    "chase",
    "sleep",
    "stop"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var comparray = function(arr1,arr2){
2  return (JSON.stringify(arr1) === JSON.stringify(arr2));
3};
4var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 50000, onlyMAP: false}, function() {
5  let vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop'];
6  var wordToDistribution = mem(function(word) {
7    return dirichletDrift({alpha:ones([vocab.length,1]), concentration:10});
8  });
9  var transition = function(word) {
10    return categorical({ps: wordToDistribution(word), vs: vocab});
11  };
12  let generateSentence = function(lastState, sentence) {
13    let word = transition(lastState);
14    if (word == 'stop') return ['stop'];
15    return [word].concat(generateSentence(word, sentence));
16  };
17  let obs = ['dogs', 'chase', 'cats', 'stop'];
18  condition(comparray(obs, generateSentence('start')));
19  let newSentence = generateSentence('start');
20  condition(newSentence[1] == 'chase');
21  return newSentence[0];
22}));

◆realization0.064

python

1# probmods2-observing-sequences/ex1.c
2# Same Markov sentence model with memoized Dirichlet(ones(5)) transition latents.
3# Condition on the first sentence ['dogs','chase','cats','stop'] and on the second
4# word of a second sentence being 'chase'; return the posterior over the FIRST word
5# of that second sentence. Inference via Pyro's Importance sampler over the
6# continuous Dirichlet latents and observed/conditioned Categorical transitions.
7import pyro.infer
8from collections import defaultdict
9
10vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop']
11idx = {w: i for i, w in enumerate(vocab)}
12states = ['start'] + vocab
13alpha = torch.ones(len(vocab))  # prior ones(5); concentration:10 is the drift width.
14
15obs_sentence = ['dogs', 'chase', 'cats', 'stop']
16NEG_INF = torch.tensor(float("-inf"))
17ZERO = torch.tensor(0.0)
18
19def model():
20    theta = {s: pyro.sample(f"theta_{s}", dist.Dirichlet(alpha)) for s in states}
21
22    # condition on first sentence (observed transitions)
23    prev = 'start'
24    for t, word in enumerate(obs_sentence):
25        pyro.sample(f"obs_{t}", dist.Categorical(theta[prev]),
26                    obs=torch.tensor(idx[word]))
27        prev = word
28
29    # second sentence: query first word, condition second word == 'chase'
30    w1 = pyro.sample("w1", dist.Categorical(theta['start']))
31    w1_word = vocab[int(w1)]
32    if w1_word == 'stop':
33        # sentence is ['stop']: there is no second word, so 'chase' is impossible
34        pyro.factor("no_w2", NEG_INF)
35    else:
36        # condition the second word to be 'chase' (observed transition from w1)
37        pyro.sample("w2", dist.Categorical(theta[w1_word]), obs=torch.tensor(idx['chase']))
38    return w1
39
40posterior = pyro.infer.Importance(model, num_samples=50000)
41posterior.run()
42
43weight_by_label = defaultdict(float)
44total = 0.0
45for tr, lw in zip(posterior.exec_traces, posterior.log_weights):
46    w = float(torch.as_tensor(lw).exp())
47    if w == 0.0:
48        continue
49    label = vocab[int(tr.nodes["_RETURN"]["value"])]
50    weight_by_label[label] += w
51    total += w
52
53ANSWER = {lab: weight_by_label.get(lab, 0.0) / total for lab in vocab}
54

02answer overlay — webppl vs pyrodist/finite

webppl pyro4 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.1445 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.081, 0.049] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0641 ≤ tol 0.2890 · floors 0.0196/0.1445

★ feedback on this problem

probmods2-observing-sequences / ex2.a

answer dist/finite solver accept pyro pass 0.1290

00 statement source: exercises/05-observing-sequences.md

given

Vocabulary: {'dogs', 'cats', 'chase', 'sleep', 'stop'}. Each word (including a special start token) has its own transition distribution drawn from a symmetric Dirichlet with concentration 10 over the 5 vocabulary words (alpha = ones([5,1]), concentration = 10). These per-word transition distributions are shared across all sentences (memoized). The first observed sentence is ['dogs', 'chase', 'cats', 'stop']. The sentence generator terminates upon drawing 'stop' and includes 'stop' in the output.

model

Words are generated sequentially. Starting from a special 'start' token, each successive word is drawn from the memoized transition distribution of the current word; the sentence ends when 'stop' is drawn (included in output). A second independent sentence is generated from the same shared transition distributions.

query

Using MCMC with burn-in 10000 and 50000 posterior samples (onlyMAP: false), condition on the first observed sentence, and also condition on the first word of the second sentence being 'cats'. Return the posterior distribution over the second word of the second sentence.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "dogs",
    "cats",
    "chase",
    "sleep",
    "stop"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var comparray = function(arr1,arr2){
2  return (JSON.stringify(arr1) === JSON.stringify(arr2));
3};
4var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 50000, onlyMAP: false}, function() {
5  let vocab = ['dogs', 'cats', 'chase', 'sleep', 'stop'];
6  var wordToDistribution = mem(function(word) {
7    return dirichletDrift({alpha:ones([vocab.length,1]), concentration:10});
8  });
9  var transition = function(word) {
10    return categorical({ps: wordToDistribution(word), vs: vocab});
11  };
12  let generateSentence = function(lastState, sentence) {
13    let word = transition(lastState);
14    if (word == 'stop') return ['stop'];
15    return [word].concat(generateSentence(word, sentence));
16  };
17  let obs = ['dogs', 'chase', 'cats', 'stop'];
18  condition(comparray(obs, generateSentence('start')));
19  let newSentence = generateSentence('start');
20  condition(newSentence[0] == 'cats');
21  return newSentence[1];
22}));

◆realization0.129

python

1import pyro.infer
2from collections import defaultdict
3
4# Dirichlet-Categorical word-bigram model.
5# vocab = transition target set; each word has its own Dirichlet transition dist
6# with alpha = ones(5) * concentration(10), matching the WebPPL reference.
7# Conditioning: sentence 1 = ['dogs','chase','cats','stop'] generated from 'start',
8# and sentence 2's first word = 'cats'. Query: sentence 2's second word
9# = transition('cats'). Inference is run with Importance; the Dirichlet
10# transition latents are sampled and every transition is conditioned via obs=,
11# so the posterior over the transition distributions (and thus the next word)
12# is produced by inference, not by a conjugate formula.
13
14VOCAB = ['dogs', 'cats', 'chase', 'sleep', 'stop']
15IDX = {w: i for i, w in enumerate(VOCAB)}
16CONC = 10.0
17
18
19def model():
20    # One Dirichlet transition distribution per conditioning source state.
21    # States that appear as a 'from' word in this problem: start, dogs, chase, cats.
22    cache = {}
23
24    def trans_dist(state):
25        if state not in cache:
26            cache[state] = pyro.sample(
27                f"trans_{state}", dist.Dirichlet(torch.ones(len(VOCAB)) * CONC)
28            )
29        return cache[state]
30
31    def observe_transition(name, frm, to):
32        # Condition: transitioning from `frm` produced word `to`.
33        pyro.sample(
34            name,
35            dist.Categorical(probs=trans_dist(frm)),
36            obs=torch.tensor(IDX[to]),
37        )
38
39    # Sentence 1: start -> dogs -> chase -> cats -> stop
40    observe_transition("s1_0", "start", "dogs")
41    observe_transition("s1_1", "dogs", "chase")
42    observe_transition("s1_2", "chase", "cats")
43    observe_transition("s1_3", "cats", "stop")
44
45    # Sentence 2: first word forced to 'cats' (start -> cats).
46    observe_transition("s2_0", "start", "cats")
47
48    # Second word of sentence 2 = transition('cats'); this is the query.
49    second = pyro.sample("s2_1", dist.Categorical(probs=trans_dist("cats")))
50    return second
51
52
53posterior = pyro.infer.Importance(model, num_samples=8000).run()
54log_weights = torch.tensor(posterior.log_weights)
55weights = torch.softmax(log_weights, dim=0)
56
57agg = defaultdict(float)
58for trace, w in zip(posterior.exec_traces, weights.tolist()):
59    val = trace.nodes["s2_1"]["value"].item()
60    agg[VOCAB[int(val)]] += w
61
62total = sum(agg.values())
63ANSWER = {w: agg.get(w, 0.0) / total for w in VOCAB}
64

02answer overlay — webppl vs pyrodist/finite

webppl pyro5 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.1343 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.137, 0.121] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.1290 ≤ tol 0.2685 · floors 0.0199/0.1343

★ feedback on this problem

probmods2-observing-sequences / ex2.c

answer dist/finite solver accept pyro pass 0.0950

00 statement source: exercises/05-observing-sequences.md

given

Parts of speech (POS): N (nouns: 'dogs', 'cats'), V (verbs: 'chase', 'sleep'), and a terminal tag 'stop'. Each POS (including a special 'start' tag) has its own transition distribution over the three tags {N, V, stop}, drawn from a symmetric Dirichlet with concentration 10 over the 3 tags (alpha = ones([3,1]), concentration = 10). Given a POS tag, a word is drawn uniformly from that tag's word set: N draws uniformly from {'dogs', 'cats'}; V draws uniformly from {'chase', 'sleep'}; 'stop' maps to the terminal word 'stop'. POS transition distributions are memoized (shared globally).

model

Sentences are generated by a hidden Markov model. Starting at a special 'start' POS tag, the next POS is drawn from the current tag's transition distribution. A word is then emitted by drawing uniformly from the POS's word set. This continues until 'stop' is drawn as a POS, at which point 'stop' is emitted as the final word and generation ends. There is no observed evidence; sentences are sampled unconditionally.

query

Return one unconditional forward sample of the first word generated from the 'start' state (i.e., draw the next POS from the start-state transition distribution and return the corresponding word).

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "protocol": "draws",
  "support": [
    "dogs",
    "cats",
    "chase",
    "sleep",
    "stop"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var drawWord = function(pos){
2  return (pos=="N") ? uniformDraw(['dogs','cats']) :
3         (pos=="V") ? uniformDraw(['chase','sleep']) : 
4         'stop';
5};
6var POS = ["N", "V", "stop"];
7
8var posToDistribution = mem(function(pos) {
9  return dirichletDrift({alpha:ones([POS.length,1]), concentration:10});
10});
11
12var transition = function(pos) {
13  return categorical({ps: posToDistribution(pos), vs: POS});
14};
15
16var generateSentence = function(lastPOS) {
17  var nextPOS = transition(lastPOS);
18  var word = drawWord(nextPOS);
19  return (word == 'stop') ? [word] : [word].concat(generateSentence(nextPOS));
20};
21var ANSWER = (drawWord(transition("start")));

◆realization0.095

python

1POS = ["N", "V", "stop"]
2concentration = 10.0
3
4# Memoized per-tag transition distributions, each a symmetric Dirichlet(ones*concentration).
5_pos_dists = {}
6
7
8def pos_to_distribution(pos):
9    if pos not in _pos_dists:
10        alpha = torch.ones(len(POS)) * concentration
11        _pos_dists[pos] = pyro.sample(f"dir_{pos}", dist.Dirichlet(alpha))
12    return _pos_dists[pos]
13
14
15def transition(pos):
16    probs = pos_to_distribution(pos)
17    idx = pyro.sample(f"trans_{pos}_{random.randrange(2 ** 31)}", dist.Categorical(probs))
18    return POS[int(idx)]
19
20
21def draw_word(pos):
22    if pos == "N":
23        i = pyro.sample(f"wN_{random.randrange(2 ** 31)}", dist.Categorical(torch.ones(2)))
24        return ["dogs", "cats"][int(i)]
25    if pos == "V":
26        i = pyro.sample(f"wV_{random.randrange(2 ** 31)}", dist.Categorical(torch.ones(2)))
27        return ["chase", "sleep"][int(i)]
28    return "stop"
29
30
31# One unconditional forward sample of the first word from the 'start' state.
32ANSWER = draw_word(transition("start"))
33

02answer overlay — webppl vs pyrodist/finite

webppl pyro5 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.1050 (tv)
solver re-derivation	accept	1/2 solvers · d=[0.090, —] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0950 ≤ tol 0.2200 · floors 0.0650/0.1050

★ feedback on this problem

probmods2-observing-sequences / ex2.d

answer dist/finite solver accept pyro pass 0.0794

00 statement source: exercises/05-observing-sequences.md

given

Parts of speech (POS): N (nouns: 'dogs', 'cats'), V (verbs: 'chase', 'sleep'), and a terminal tag 'stop'. Each POS (including a special 'start' tag) has its own transition distribution over the three tags {N, V, stop}, drawn from a symmetric Dirichlet with concentration 10 (alpha = ones([3,1]), concentration = 10). POS transition distributions are memoized (shared globally). Given a POS tag, a word is drawn uniformly from that tag's word set: N draws uniformly from {'dogs', 'cats'}; V draws uniformly from {'chase', 'sleep'}; 'stop' maps to the terminal word 'stop'. The first observed sentence is ['dogs', 'chase', 'cats', 'stop']. The sentence generator terminates upon drawing 'stop' as a POS and includes 'stop' in the output.

model

Sentences are generated by a hidden Markov model. Starting at a special 'start' POS tag, successive POS tags are drawn from the current tag's memoized transition distribution, and words are emitted from the corresponding word set. Generation ends when 'stop' is the next POS (included in the output). The same memoized POS transition distributions are shared between the observed sentence and the new sentence.

query

Using MCMC with burn-in 10000, 1000 samples, and lag 10 (onlyMAP: false), condition on the observed sentence, and also condition on the first word of a new second sentence being 'cats'. Return the posterior distribution over the second word of the new sentence.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "dogs",
    "cats",
    "chase",
    "sleep",
    "stop"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var comparray = function(arr1,arr2){
2  return (JSON.stringify(arr1) === JSON.stringify(arr2));
3};
4
5var drawWord = function(pos){
6  return (pos=="N") ? uniformDraw(['dogs','cats']) :
7         (pos=="V") ? uniformDraw(['chase','sleep']) : 
8         'stop';
9};
10var POS = ["N", "V", "stop"];
11var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 1000, lag:10, onlyMAP: false}, function() {
12  var posToDistribution = mem(function(pos) {
13    return dirichletDrift({alpha:ones([POS.length,1]), concentration:10});
14  });
15
16  var transition = function(pos) {
17    return categorical({ps: posToDistribution(pos), vs: POS});
18  };
19
20  let generateSentence = function(lastPOS) {
21    let nextPOS = transition(lastPOS);
22    let word = drawWord(nextPOS);
23    return (word == 'stop') ? [word] : [word].concat(generateSentence(nextPOS));
24  };
25  let obs = ['dogs', 'chase', 'cats', 'stop'];
26  condition(comparray(obs, generateSentence('start')));
27
28  let newSentence = generateSentence('start');
29  condition(newSentence[0] == 'cats');
30  return newSentence[1];
31}));

◆realization0.079

python

1# POS-level HMM. Tags {N,V,stop}; each tag (and 'start') has a memoized transition
2# distribution ~ Dirichlet(ones(3)) (continuous). Words: N->unif{dogs,cats},
3# V->unif{chase,sleep}, stop->'stop'. Observed words ['dogs','chase','cats','stop']
4# force the POS chain [N,V,N,stop], i.e. transitions start->N, N->V, V->N, N->stop
5# (each word maps to a unique POS). A new sentence is generated from the same shared
6# distributions; conditioning its first word == 'cats' forces its first POS = N
7# (only N emits 'cats'; the uniform emission is a constant factor), so start->N is
8# observed again. The query is the new sentence's second word: draw the second POS
9# from N's distribution, then emit a word from that POS. Discrete conditioning ->
10# Importance sampling.
11POS = ['N', 'V', 'stop']
12pidx = {p: i for i, p in enumerate(POS)}
13T = len(POS)
14out_words = ['dogs', 'cats', 'chase', 'sleep', 'stop']
15N_WORDS = ['dogs', 'cats']
16V_WORDS = ['chase', 'sleep']
17
18def draw_word(pos_i, name):
19    pos = POS[pos_i]
20    if pos == 'N':
21        w = pyro.sample(name, dist.Categorical(torch.ones(2, dtype=torch.float64)))
22        return N_WORDS[int(w.item())]
23    elif pos == 'V':
24        w = pyro.sample(name, dist.Categorical(torch.ones(2, dtype=torch.float64)))
25        return V_WORDS[int(w.item())]
26    else:
27        return 'stop'
28
29def model():
30    d_start = pyro.sample('d_start', dist.Dirichlet(torch.ones(T, dtype=torch.float64)))
31    d_N = pyro.sample('d_N', dist.Dirichlet(torch.ones(T, dtype=torch.float64)))
32    d_V = pyro.sample('d_V', dist.Dirichlet(torch.ones(T, dtype=torch.float64)))
33    # Observed sentence forces POS chain start->N->V->N->stop.
34    pyro.sample('o0', dist.Categorical(d_start), obs=torch.tensor(pidx['N']))
35    pyro.sample('o1', dist.Categorical(d_N), obs=torch.tensor(pidx['V']))
36    pyro.sample('o2', dist.Categorical(d_V), obs=torch.tensor(pidx['N']))
37    pyro.sample('o3', dist.Categorical(d_N), obs=torch.tensor(pidx['stop']))
38    # New sentence: first word == 'cats' forces first POS = N (shared start dist).
39    pyro.sample('n0', dist.Categorical(d_start), obs=torch.tensor(pidx['N']))
40    # Query: second POS drawn from N's distribution, then emit its word.
41    second_pos = pyro.sample('n1', dist.Categorical(d_N))
42    word = draw_word(int(second_pos.item()), 'w1')
43    return word
44
45post = pyro.infer.Importance(model, num_samples=8000).run()
46lw = torch.tensor(post.log_weights, dtype=torch.float64)
47w = (lw - lw.max()).exp()
48w = w / w.sum()
49probs = {word: 0.0 for word in out_words}
50for tr, wi in zip(post.exec_traces, w):
51    probs[tr.nodes['_RETURN']['value']] += wi.item()
52
53ANSWER = probs
54

02answer overlay — webppl vs pyrodist/finite

webppl pyro5 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.1690 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.143, 0.188] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0794 ≤ tol 0.3380 · floors 0.0461/0.1690

★ feedback on this problem

probmods2-observing-sequences / ex3.a

answer dist/finite solver accept pyro pass 0.0746

00 statement source: exercises/05-observing-sequences.md

given

Part-of-speech tags: N (nouns: 'dog', 'cat'), V (verbs: 'chases', 'sleeps'), D (determiners: 'the', 'a'), A (adverbs: 'dilligently'), and 'stop'. The tag set is {N, V, D, A, stop}. Each tag has an associated transition distribution over this same tag set, drawn from a Dirichlet distribution with concentration 10 and a uniform pseudo-count vector of length 5 (all entries 1). These per-tag transition distributions are fixed across positions in a sentence (shared, memoized). The observed sentence is ['the', 'dog', 'chases', 'a', 'cat', 'stop'], soft-conditioned with a factor weight of exp(5) for matching.

model

A hidden Markov model over POS tags generates sentences by sequentially sampling the next tag from the current tag's transition distribution, then emitting the corresponding word (deterministically for A and stop, uniformly otherwise). Sentence generation begins from a special 'start' state. The per-tag transition distributions are latent random variables shared across all positions.

query

The posterior distribution over the first POS tag in a newly generated sentence (i.e., the tag drawn by transitioning from 'start'), given the soft conditioning on the observed sentence. Use MCMC with 1000 samples, burn-in 10000, and lag 10.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "N",
    "V",
    "D",
    "A",
    "stop"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var comparray = function(arr1,arr2){
2  return (JSON.stringify(arr1) === JSON.stringify(arr2));
3};
4
5var drawWord = function(pos){
6  return (pos=="N") ? uniformDraw(['dog','cat']) :
7         (pos=="V") ? uniformDraw(['chases','sleeps']) : 
8         (pos=="D") ? uniformDraw(['the','a']) :
9         (pos=="A") ? 'dilligently' : 
10         'stop';
11};
12var POS = ["N", "V", "D", "A", "stop"];
13var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 1000, lag:10}, function() {
14  var posToDistribution = mem(function(pos) {
15    return dirichletDrift({alpha:ones([POS.length,1]), concentration:10});
16  });
17
18  var transition = function(pos) {
19    return categorical({ps: posToDistribution(pos), vs: POS});
20  };
21
22  let generateSentence = function(lastPOS) {
23    let nextPOS = transition(lastPOS);
24    let word = drawWord(nextPOS);
25    return (word == 'stop') ? [word] : [word].concat(generateSentence(nextPOS));
26  };
27  let obs = ['the', 'dog', 'chases', 'a', 'cat', 'stop'];
28
29  factor(comparray(obs, generateSentence('start'))*5);
30
31  return transition('start');
32}));

◆realization0.075

python

1import pyro.infer
2from collections import defaultdict
3
4# POS-tag HMM with Dirichlet transition distributions (alpha = ones(5)*10).
5# Emissions are deterministic in the reverse direction here: every observed word
6# maps to a unique POS, so the observed POS chain is forced:
7#   the->D, dog->N, chases->V, a->D, cat->N, stop->stop.
8# Rather than conditioning on an exact forward-generated string (whose prior
9# probability is ~1e-6, which collapses prior Importance to the prior), we
10# condition on the forced POS transitions directly via obs=, exactly as the
11# sibling ex2.d does. The query is the first POS of a NEW sentence,
12# transition('start'), which is a fresh draw from the (posterior) Dirichlet
13# trans_start; the posterior is produced by running Importance inference.
14
15POS = ["N", "V", "D", "A", "stop"]
16IDX = {p: i for i, p in enumerate(POS)}
17CONC = 10.0
18
19# Observed sentence words and their forced POS tags.
20# words: the dog chases a cat stop
21OBS_POS = ["D", "N", "V", "D", "N", "stop"]
22
23
24def model():
25    cache = {}
26
27    def trans_dist(state):
28        if state not in cache:
29            cache[state] = pyro.sample(
30                f"trans_{state}", dist.Dirichlet(torch.ones(len(POS)) * CONC)
31            )
32        return cache[state]
33
34    # Condition on the forced POS chain of the observed sentence:
35    # start -> D -> N -> V -> D -> N -> stop
36    prev = "start"
37    for i, tag in enumerate(OBS_POS):
38        pyro.sample(
39            f"obs_{i}",
40            dist.Categorical(probs=trans_dist(prev)),
41            obs=torch.tensor(IDX[tag]),
42        )
43        prev = tag
44
45    # Query: first POS of a new sentence = transition('start').
46    first = pyro.sample("new_first", dist.Categorical(probs=trans_dist("start")))
47    return first
48
49
50posterior = pyro.infer.Importance(model, num_samples=8000).run()
51log_weights = torch.tensor(posterior.log_weights)
52weights = torch.softmax(log_weights, dim=0)
53
54agg = defaultdict(float)
55for trace, w in zip(posterior.exec_traces, weights.tolist()):
56    val = trace.nodes["new_first"]["value"].item()
57    agg[POS[int(val)]] += w
58
59total = sum(agg.values())
60ANSWER = {p: agg.get(p, 0.0) / total for p in POS}
61

02answer overlay — webppl vs pyrodist/finite

webppl pyro5 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.2410 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.201, 0.131] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0746 ≤ tol 0.4820 · floors 0.0285/0.2410

★ feedback on this problem

probmods2-observing-sequences / ex3.b

answer dist/finite solver accept pyro pass 0.0622

00 statement source: exercises/05-observing-sequences.md

given

Vocabulary: determiners {the, a} (uniform); nouns {cat, dog} (uniform); verbs {chases, sleeps} (uniform); adverbs {diligently} (only option). Production probabilities are all uniform where a choice exists. The observed sentence has the structure [['the', 'dog'], ['chases', ['a', 'cat']]]: a noun phrase followed by a verb phrase consisting of a verb and a noun phrase. Conditioning is hard (exact match).

model

A phrase-structure grammar generates sentences recursively. A sentence (S) is a noun phrase (NP) followed by a verb phrase (VP). An NP is a determiner followed by a noun. A VP is either a verb followed by an adverb phrase (AP), or a verb followed by an NP; each option is equally likely. An AP consists of a single adverb. All terminal categories draw uniformly from their word lists.

query

Within one model: a first sentence is generated and conditioned to exactly match the observed sentence; then a SECOND sentence is generated by the same grammar, as a fresh, independent draw (the grammar has fixed production probabilities — no parameters are shared between the two sentences). Report the distribution over the second sentence's verb. Use MCMC with 1000 samples and burn-in 10000.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "chases",
    "sleeps"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var comparray = function(arr1,arr2){
2  return (JSON.stringify(arr1) === JSON.stringify(arr2));
3};
4
5var uniformDraw = function (xs) {return xs[randomInteger(xs.length)]};
6
7var D  = function() {return uniformDraw(['the', 'a'])};
8var N  = function() {return uniformDraw(['cat', 'dog'])};
9var V  = function() {return uniformDraw(['chases', 'sleeps'])};
10var A  = function() {return uniformDraw(['diligently'])};
11var AP = function() {return uniformDraw([A()])};
12var NP = function() {return [D(), N()]};
13var VP = function() {return uniformDraw([[V(), AP()],
14                                         [V(), NP()]])};
15var S  = function() {return [NP(), VP()]};
16var ANSWER = (Infer({method:'MCMC', burn:10000, samples: 1000}, function() {
17  let obs = [['the', 'dog'], ['chases', ['a', 'cat']]];
18  condition(comparray(obs, S()));
19
20  return S()[1][0];
21}));

◆realization0.062

python

1# Phrase-structure grammar. A first sentence S is generated and conditioned to
2# match the observed sentence; a SECOND independent sentence is generated by the
3# same grammar; report the distribution over the second sentence's verb (S2[1][0]).
4# The model genuinely draws every choice as a pyro.sample site and conditions the
5# first sentence with a hard pyro.factor; inference (Importance) produces the answer.
6
7dets = ['the', 'a']
8nouns = ['cat', 'dog']
9verbs = ['chases', 'sleeps']
10adverbs = ['diligently']
11obs = [['the', 'dog'], ['chases', ['a', 'cat']]]
12
13
14def udraw(name, xs):
15    i = pyro.sample(name, dist.Categorical(torch.ones(len(xs))))
16    return xs[int(i)]
17
18
19def gen_NP(tag):
20    d = udraw(tag + '_d', dets)
21    n = udraw(tag + '_n', nouns)
22    return [d, n]
23
24
25def gen_AP(tag):
26    a = udraw(tag + '_a', adverbs)
27    return [a]
28
29
30def gen_VP(tag):
31    v = udraw(tag + '_v', verbs)
32    branch = int(pyro.sample(tag + '_branch', dist.Categorical(torch.tensor([0.5, 0.5]))))
33    if branch == 0:
34        return [v, gen_AP(tag + '_ap')]
35    else:
36        return [v, gen_NP(tag + '_vnp')]
37
38
39def gen_S(tag):
40    return [gen_NP(tag + '_np'), gen_VP(tag + '_vp')]
41
42
43def model():
44    s1 = gen_S('s1')
45    match = 0.0 if s1 == obs else float('-inf')
46    pyro.factor('cond', torch.tensor(match))
47    s2 = gen_S('s2')
48    # encode the second verb as a categorical index so EmpiricalMarginal can sample it
49    second_verb = s2[1][0]
50    return torch.tensor(float(verbs.index(second_verb)))
51
52
53posterior = pyro.infer.Importance(model, num_samples=4000).run()
54marg = pyro.infer.EmpiricalMarginal(posterior)
55counts = Counter()
56for _ in range(8000):
57    counts[int(marg.sample().item())] += 1
58total = sum(counts.values())
59ANSWER = {verbs[i]: counts.get(i, 0) / total for i in range(len(verbs))}
60

02answer overlay — webppl vs pyrodist/finite

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.1820 (tv)
solver re-derivation	accept	1/2 solvers · d=[—, 0.056] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0622 ≤ tol 0.4318 · floors 0.2159/0.1820

★ feedback on this problem

probmods2-occams-razor / ex1.2

answer dist/finite solver accept pyro pass 0.0000

00 statement source: exercises/occams-razor.md

given

Integers 1 through 20 are in scope (maxNumber = 20). The hypothesis space is a 50/50 mixture of two kinds of concepts: (1) rule-based concepts — multiples of N for N in 1..11, powers of N for N in 1..11 (exponents start at 0, so every powers concept includes 1), all evens, all odds (24 rules total); (2) interval concepts — all integers from a through b inclusive, for every pair with 1 ≤ a < b ≤ 20. Each rule-based hypothesis is equally likely within its class; each interval hypothesis is equally likely within its class. The likelihood of a hypothesis is the size principle: each observed example is independently drawn uniformly from the concept's extension. Observed examples: [3, 10]. Test query: 12.

model

A hypothesis is drawn from the mixed prior. Each observed example is generated by drawing uniformly from the set of integers the hypothesis covers. The test query's membership in the hypothesis's set is recorded along with the hypothesis name.

query

The posterior distribution over (hypothesis name, whether the test query 12 belongs to that hypothesis's set) pairs, given the two observed examples. Hypothesis labels are strings of the form 'interval_a_b' (e.g. 'interval_1_10') for the interval [a, b]; 'multiples_of_N' (e.g. 'multiples_of_3') and 'powers_of_N' (e.g. 'powers_of_2') for the rule concepts; 'evens' and 'odds' for the parity concepts.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "labels": {
    "record": {
      "hypothesis": "string",
      "testQueryResponse": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var maxNumber = 20;
2var filterByInRange =  function(set) {
3  // NOTE: deviates from the textbook starter code, whose ranges put 0 into every
4// multiples concept and dropped maxNumber from evens/odds — contradicting the
5// stated domain 1..maxNumber. Do not 'restore' to source. See _gate_triage.md.
6  var inRange = function(v) {v <= maxNumber && v >= 1};
7  return _.uniq(filter(inRange, set));
8};
9var genEvens = function() {
10  return filter(function(v) {return v % 2 == 0}, _.range(1, maxNumber + 1));
11};
12var genOdds = function() {
13  return filter(function(v) {return (v + 1) % 2 == 0}, _.range(1, maxNumber + 1));
14};
15var genMultiples = function(base) {
16  var multiples = map(function(v) {return base * v}, _.range(1, maxNumber + 1));
17  return filterByInRange(multiples);
18};
19var genPowers = function(base) {
20  var powers = map(function(v) {return Math.pow(base, v)}, _.range(maxNumber));
21  return filterByInRange(powers);
22};
23var inSet = function(val, set) { return _.includes(set, val); };
24var makeRuleHypothesisSpace = function() {
25  var multipleRules = map(function(base) {return 'multiples_of_' + base}, _.range(1, 12));
26  var powerRules = map(function(base) {return 'powers_of_' + base}, _.range(1, 12));
27  return multipleRules.concat(powerRules).concat(['evens', 'odds']);
28};
29var genSetFromInterval = function(a, b) { return _.range(a, b+1); };
30
31var makeIntervalHypothesisSpace = function(start, end) {
32  var allIntervals = _.flatten(map(function(s) {
33    return map(function(e) { [s, e] }, genSetFromInterval(s+1, end));
34  }, genSetFromInterval(start, end)));
35  return map(function(x) { 'interval_' + x[0] + '_' + x[1] }, allIntervals);
36};
37
38var getSetFromHypothesis = function(rule) {
39  var parts = rule.split('_');
40  return (parts[0] == 'multiples' ? genMultiples(_.parseInt(parts[2])) :
41          parts[0] == 'powers' ? genPowers(_.parseInt(parts[2])) :
42          parts[0] == 'evens' ? genEvens() :
43          parts[0] == 'odds' ? genOdds() :
44          parts[0] == 'interval' ? genSetFromInterval(_.parseInt(parts[1]), _.parseInt(parts[2])) :
45          console.error('unknown rule' + rule));
46};
47
48var learnConcept = function(examples, testQuery) {
49  return Infer({method: 'enumerate'}, function() {
50    var rules = makeRuleHypothesisSpace();
51    var intervals = makeIntervalHypothesisSpace(1, maxNumber);
52    var hypothesis = flip(0.5) ? uniformDraw(rules) : uniformDraw(intervals);
53    var set = getSetFromHypothesis(hypothesis);
54    mapData({data: examples}, function(example) {
55      observe(Categorical({vs: set}), example);
56    });
57    return {hypothesis: hypothesis,
58            testQueryResponse: inSet(testQuery, set)};
59  });
60};
61var ANSWER = (learnConcept([3, 10], 12));

◆realization0.000

python

1
2# probmods2-occams-razor/ex1.2
3# Number-game concept learning. Hypothesis space = 50/50 mixture of rule
4# concepts (multiples_of_N, powers_of_N for N in 1..11, evens, odds) and
5# interval concepts (a..b, 1<=a<b<=20). Size-principle likelihood. Examples
6# [3, 10]; test query 12. Posterior over (hypothesis, in-set(12)) via exact
7# enumeration.
8
9maxNumber = 20
10
11def gen_evens():
12    return sorted(set(v for v in range(1, maxNumber + 1) if v % 2 == 0))
13
14def gen_odds():
15    return sorted(set(v for v in range(1, maxNumber + 1) if (v + 1) % 2 == 0))
16
17def gen_multiples(base):
18    return sorted(set(v for v in (base * k for k in range(1, maxNumber + 1))
19                      if 1 <= v <= maxNumber))
20
21def gen_powers(base):
22    return sorted(set(v for v in (base ** e for e in range(maxNumber))
23                      if 1 <= v <= maxNumber))
24
25def get_set(rule):
26    parts = rule.split("_")
27    if parts[0] == "multiples":
28        return gen_multiples(int(parts[2]))
29    if parts[0] == "powers":
30        return gen_powers(int(parts[2]))
31    if rule == "evens":
32        return gen_evens()
33    if rule == "odds":
34        return gen_odds()
35    if parts[0] == "interval":
36        return list(range(int(parts[1]), int(parts[2]) + 1))
37    raise ValueError(rule)
38
39rule_hyps = ([f"multiples_of_{b}" for b in range(1, 12)] +
40             [f"powers_of_{b}" for b in range(1, 12)] + ["evens", "odds"])
41interval_hyps = [f"interval_{a}_{b}"
42                 for a in range(1, maxNumber + 1) for b in range(a + 1, maxNumber + 1)]
43n_rules = len(rule_hyps)
44n_intervals = len(interval_hyps)
45
46# Flatten the 50/50 mixture into one categorical prior over all hypotheses:
47# 0.5 mass split uniformly within the rule class, 0.5 within the interval class.
48all_hyps = rule_hyps + interval_hyps
49prior = torch.cat([
50    torch.full((n_rules,), 0.5 / n_rules),
51    torch.full((n_intervals,), 0.5 / n_intervals),
52])
53
54examples = [3, 10]
55test_query = 12
56
57# Size-principle log-likelihood of the examples per hypothesis:
58# each example drawn uniformly from the concept extension.
59def loglik(hyp):
60    s = set(get_set(hyp))
61    if any(ex not in s for ex in examples):
62        return float("-inf")
63    return len(examples) * (-math.log(len(s)))
64
65logliks = torch.tensor([loglik(h) for h in all_hyps])
66
67@pyro.infer.config_enumerate
68def model():
69    h = pyro.sample("hyp", dist.Categorical(prior))
70    pyro.factor("lik", logliks[h])
71    return h
72
73marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(
74    model, lambda: None
75)
76post = marg["hyp"].probs
77
78agg = {}
79for i, hyp in enumerate(all_hyps):
80    pr = post[i].item()
81    if pr <= 0.0:
82        continue
83    tqr = test_query in set(get_set(hyp))
84    key = json.dumps({"hypothesis": hyp, "testQueryResponse": tqr}, sort_keys=True)
85    agg[key] = agg.get(key, 0.0) + pr
86
87ANSWER = agg
88

02answer overlay — webppl vs pyrodist/finite

webppl pyro34 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-occams-razor / ex1.3

answer value/realvec solver accept pyro pass 0.0000

00 statement source: exercises/occams-razor.md

given

Integers 1 through 20 are in scope (maxNumber = 20). The hypothesis space is a 50/50 mixture of rule-based concepts (multiples of N for N in 1..11, powers of N for N in 1..11 (exponents start at 0, so every powers concept includes 1), all evens, all odds) and interval concepts (all integers from a through b inclusive for every a < b in [1, 20]). Each concept hypothesis is equally likely within its class. The likelihood of a hypothesis is the size principle: each observed example is drawn uniformly from the concept's extension. Observed examples: [3, 6, 9].

model

A hypothesis is drawn from the mixed prior. Each observed example is generated by drawing uniformly from the concept the hypothesis covers. For a given test integer, the probability that integer belongs to the inferred concept is computed as the posterior expectation of membership.

query

The 20-element array of expected membership probabilities, one per integer from 1 to 20 in order, where each entry is the expected posterior probability that the integer belongs to the inferred concept given the examples [3, 6, 9].

answer spec value/realvec

{
  "kind": "value",
  "domain": "realvec"
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var maxNumber = 20;
2var filterByInRange =  function(set) {
3  // NOTE: deviates from the textbook starter code, whose ranges put 0 into every
4// multiples concept and dropped maxNumber from evens/odds — contradicting the
5// stated domain 1..maxNumber. Do not 'restore' to source. See _gate_triage.md.
6  var inRange = function(v) {v <= maxNumber && v >= 1};
7  return _.uniq(filter(inRange, set));
8};
9var genEvens = function() {
10  return filter(function(v) {return v % 2 == 0}, _.range(1, maxNumber + 1));
11};
12var genOdds = function() {
13  return filter(function(v) {return (v + 1) % 2 == 0}, _.range(1, maxNumber + 1));
14};
15var genMultiples = function(base) {
16  var multiples = map(function(v) {return base * v}, _.range(1, maxNumber + 1));
17  return filterByInRange(multiples);
18};
19var genPowers = function(base) {
20  var powers = map(function(v) {return Math.pow(base, v)}, _.range(maxNumber));
21  return filterByInRange(powers);
22};
23var inSet = function(val, set) { return _.includes(set, val); };
24var makeRuleHypothesisSpace = function() {
25  var multipleRules = map(function(base) {return 'multiples_of_' + base}, _.range(1, 12));
26  var powerRules = map(function(base) {return 'powers_of_' + base}, _.range(1, 12));
27  return multipleRules.concat(powerRules).concat(['evens', 'odds']);
28};
29var genSetFromInterval = function(a, b) { return _.range(a, b+1); };
30var makeIntervalHypothesisSpace = function(start, end) {
31  var allIntervals = _.flatten(map(function(s) {
32    return map(function(e) { [s, e] }, genSetFromInterval(s+1, end));
33  }, genSetFromInterval(start, end)));
34  return map(function(x) { 'interval_' + x[0] + '_' + x[1] }, allIntervals);
35};
36var getSetFromHypothesis = function(rule) {
37  var parts = rule.split('_');
38  return (parts[0] == 'multiples' ? genMultiples(_.parseInt(parts[2])) :
39          parts[0] == 'powers' ? genPowers(_.parseInt(parts[2])) :
40          parts[0] == 'evens' ? genEvens() :
41          parts[0] == 'odds' ? genOdds() :
42          parts[0] == 'interval' ? genSetFromInterval(_.parseInt(parts[1]), _.parseInt(parts[2])) :
43          console.error('unknown rule' + rule));
44};
45var learnConcept = function(examples, testQuery) {
46  return Infer({method: 'enumerate'}, function() {
47    var rules = makeRuleHypothesisSpace();
48    var intervals = makeIntervalHypothesisSpace(1, maxNumber);
49    var hypothesis = flip(0.5) ? uniformDraw(rules) : uniformDraw(intervals);
50    var set = getSetFromHypothesis(hypothesis);
51    mapData({data: examples}, function(example) {
52      observe(Categorical({vs: set}), example);
53    });
54    return {hypothesis: hypothesis,
55            testQueryResponse: inSet(testQuery, set)};
56  });
57};
58
59var examples = [3, 6, 9];
60var queries = genSetFromInterval(1, maxNumber);
61var ANSWER = (map(function(query) {
62  var post = learnConcept(examples, query);
63  return expectation(marginalize(post, function(x) { x.testQueryResponse }));
64}, queries));

◆realization0.000

python

1# Number-game concept learning over an enumerable hypothesis space.
2# Prior: 50/50 mixture of rule concepts (24) and interval concepts (190).
3# Likelihood: size principle (each example uniform over the concept extension).
4# We enumerate the hypothesis with Pyro's exact discrete inference
5# (config_enumerate + TraceEnum_ELBO.compute_marginals) and read the expected
6# membership probability for each integer 1..20 off the posterior over concepts.
7#
8# NOTE: the WebPPL GT deliberately deviates from the textbook ranges (it keeps
9# integers in [1, maxNumber]); we reproduce the GT ranges exactly here and do
10# NOT restore textbook behavior.
11
12maxNumber = 20
13
14def filterByInRange(values):
15    seen = []
16    for v in values:
17        if 1 <= v <= maxNumber and v not in seen:
18            seen.append(v)
19    return seen
20
21def genEvens():
22    return [v for v in range(1, maxNumber + 1) if v % 2 == 0]
23
24def genOdds():
25    return [v for v in range(1, maxNumber + 1) if (v + 1) % 2 == 0]
26
27def genMultiples(base):
28    return filterByInRange([base * v for v in range(1, maxNumber + 1)])
29
30def genPowers(base):
31    # exponents start at 0 (range(maxNumber) = 0..19), so 1 is always included
32    return filterByInRange([base ** v for v in range(maxNumber)])
33
34def genSetFromInterval(a, b):
35    return list(range(a, b + 1))
36
37# Build the hypothesis space: each entry is (label, frozenset of its extension).
38rule_specs = []
39for base in range(1, 12):
40    rule_specs.append(("multiples_of_" + str(base), genMultiples(base)))
41for base in range(1, 12):
42    rule_specs.append(("powers_of_" + str(base), genPowers(base)))
43rule_specs.append(("evens", genEvens()))
44rule_specs.append(("odds", genOdds()))
45
46interval_specs = []
47for s in range(1, maxNumber + 1):
48    for e in range(s + 1, maxNumber + 1):
49        interval_specs.append(("interval_" + str(s) + "_" + str(e), genSetFromInterval(s, e)))
50
51n_rules = len(rule_specs)
52n_intervals = len(interval_specs)
53
54hypotheses = [(lab, set(ext)) for lab, ext in rule_specs] + \
55             [(lab, set(ext)) for lab, ext in interval_specs]
56n_hyp = len(hypotheses)
57
58# Marginal prior over hypotheses: 0.5 split over rules, 0.5 split over intervals.
59prior = torch.zeros(n_hyp)
60for i in range(n_rules):
61    prior[i] = 0.5 / n_rules
62for j in range(n_intervals):
63    prior[n_rules + j] = 0.5 / n_intervals
64
65examples = [3, 6, 9]
66
67# Size-principle log-likelihood of the examples for each hypothesis.
68loglik = torch.full((n_hyp,), float("-inf"))
69for i, (lab, ext) in enumerate(hypotheses):
70    size = len(ext)
71    if size == 0:
72        continue
73    if all(x in ext for x in examples):
74        loglik[i] = -len(examples) * math.log(size)
75
76@pyro.infer.config_enumerate
77def model():
78    h = pyro.sample("h", dist.Categorical(prior))
79    pyro.factor("size_principle", loglik[h])
80    return h
81
82marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)["h"]
83sup = marg.enumerate_support()
84post_probs = marg.log_prob(sup).exp()
85posterior = torch.zeros(n_hyp)
86for s, p in zip(sup, post_probs):
87    posterior[int(s.item())] = p
88
89# Membership matrix: member[i][q] = 1 if integer (q+1) is in hypothesis i.
90queries = genSetFromInterval(1, maxNumber)
91ANSWER = []
92for q in queries:
93    expected = 0.0
94    for i, (lab, ext) in enumerate(hypotheses):
95        if q in ext:
96            expected += float(posterior[i].item())
97    ANSWER.append(expected)
98

02answervalue/realvec

webppl

[0.1094, 0.2326, 1.0000, 0.4010, 0.4010, 1.0000, 0.4010, 0.4010, 1.0000, 0.2990, 0.2284, 0.7763, 0.1392, 0.1101, 0.6862, 0.0690, 0.0542, 0.6410, 0.0319, 0.0234]

pyro

[0.1094, 0.2326, 1.0000, 0.4010, 0.4010, 1.0000, 0.4010, 0.4010, 1.0000, 0.2990, 0.2284, 0.7763, 0.1392, 0.1101, 0.6862, 0.0690, 0.0542, 0.6410, 0.0319, 0.0234]

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (absdiff)
solver re-derivation	accept	1/2 solvers · d=[0.000, —] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-occams-razor / ex2.1

answer record(relation, meanCp, meanB) solver accept pyro pass 0.0212

00 statement source: exercises/occams-razor.md

given

Observed data: one trial where C = true and E = false. Priors: causal relation present with probability 0.5; causal power cp drawn uniformly from [0, 1]; background rate b drawn uniformly from [0, 1]. MCMC: 10000 samples, lag 2.

model

Whether a causal relation exists, the causal power of C on E, and the background rate of E are all latent. When the relation is present, E is caused by C with probability cp or occurs due to background with probability b (noisy-OR). When the relation is absent, E occurs only due to background with probability b. Each trial's outcome is observed under this mechanism.

query

From the posterior over (relation present, cp, b) given the observed data: the marginal distribution over whether the causal relation is present, the posterior mean of cp, and the posterior mean of b.

answer spec record(relation, meanCp, meanB)

{
  "kind": "record",
  "fields": {
    "relation": {
      "kind": "dist",
      "domain": "bool"
    },
    "meanCp": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    },
    "meanB": {
      "kind": "value",
      "domain": "real",
      "estimated": true
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var observedData = [{C:true, E:false}];
2var posterior = Infer({method: 'MCMC', samples: 10000, lag:2}, function() {
3  var relation = flip();
4  var cp = uniform(0, 1);
5  var b = uniform(0, 1);
6
7  mapData({data: observedData}, function(datum) {
8    var E = (relation && datum.C && flip(cp)) || flip(b);
9    condition(E == datum.E);
10  });
11
12  return {relation, cp, b};
13});
14var ANSWER = ({
15  relation: marginalize(posterior, function(x) { return x.relation }),
16  meanCp: expectation(marginalize(posterior, function(x) { return x.cp })),
17  meanB: expectation(marginalize(posterior, function(x) { return x.b }))
18});

◆realization0.021

python

1# One trial: C=true, E=false. Latents: relation (bool), cp~U(0,1), b~U(0,1).
2# Noisy-OR: E = (relation & flip(cp)) | flip(b); condition E == false (given C=true).
3# Continuous latents cp,b are sampled by NUTS; the discrete relation and the inner
4# noisy-OR flips are marginalized by enumeration. The relation marginal is then
5# recovered with Pyro's infer_discrete over the NUTS posterior of (cp,b).
6NEG_INF = torch.tensor(float('-inf'), dtype=torch.float64)
7ZERO = torch.tensor(0.0, dtype=torch.float64)
8
9@pyro.infer.config_enumerate
10def model():
11    cp = pyro.sample('cp', dist.Uniform(0.0, 1.0))
12    b = pyro.sample('b', dist.Uniform(0.0, 1.0))
13    relation = pyro.sample('relation', dist.Bernoulli(0.5))
14    x = pyro.sample('x', dist.Bernoulli(cp))  # flip(cp)
15    y = pyro.sample('y', dist.Bernoulli(b))   # flip(b)
16    # C = true is fixed; E = (relation & x) | y
17    E = (relation.bool() & x.bool()) | y.bool()
18    # condition E == false
19    pyro.factor('obs', torch.where(~E, ZERO, NEG_INF))
20
21kernel = pyro.infer.NUTS(model)
22mcmc = pyro.infer.MCMC(kernel, num_samples=1000, warmup_steps=600)
23mcmc.run()
24samples = mcmc.get_samples()
25cp_s = samples['cp'].to(torch.float64)
26b_s = samples['b'].to(torch.float64)
27num = cp_s.shape[0]
28
29meanCp = cp_s.mean().item()
30meanB = b_s.mean().item()
31
32# Relation marginal: condition the enumerated model on the posterior (cp,b)
33# samples (placed in a plate) and let Pyro's infer_discrete draw relation.
34def vec_model():
35    with pyro.plate('particles', num, dim=-1):
36        cp = pyro.sample('cp', dist.Uniform(0.0, 1.0))
37        b = pyro.sample('b', dist.Uniform(0.0, 1.0))
38        relation = pyro.sample('relation', dist.Bernoulli(0.5))
39        x = pyro.sample('x', dist.Bernoulli(cp))
40        y = pyro.sample('y', dist.Bernoulli(b))
41        E = (relation.bool() & x.bool()) | y.bool()
42        pyro.factor('obs', torch.where(~E, ZERO, NEG_INF))
43
44cond = pyro.poutine.condition(vec_model, data={'cp': cp_s, 'b': b_s})
45serving = pyro.infer.infer_discrete(pyro.infer.config_enumerate(cond), first_available_dim=-2)
46tr = pyro.poutine.trace(serving).get_trace()
47relation_draws = tr.nodes['relation']['value'].reshape(-1).to(torch.float64)
48p_true = relation_draws.mean().item()
49
50ANSWER = {
51    'relation': {True: p_true, False: 1.0 - p_true},
52    'meanCp': meanCp,
53    'meanB': meanB,
54}
55

02answer overlay — webppl vs pyrorecord(relation, meanCp, meanB)

relation

webppl pyro2 bins

meanCp

0.4534

meanB

0.3376

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0331 (record)
solver re-derivation	accept	2/2 solvers · d=[0.034, 0.034] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0212 ≤ tol 0.0662 · floors 0.0330/0.0331

★ feedback on this problem

probmods2-occams-razor / ex2.3

answer record(cpValues, csValues) solver accept pyro pass 0.0515

00 statement source: exercises/occams-razor.md

given

Fifteen data configurations are defined by pairs (numEWithC, numEWithoutC) for a 16-trial dataset (8 trials with C=true, 8 with C=false): [[8,8],[6,6],[4,4],[2,2],[0,0],[8,6],[6,4],[4,2],[2,0],[8,4],[6,2],[4,0],[8,2],[6,0],[8,0]]. For each configuration, the dataset contains numEWithC trials of (C=true, E=true), (8 − numEWithC) trials of (C=true, E=false), numEWithoutC trials of (C=false, E=true), and (8 − numEWithoutC) trials of (C=false, E=false). Causal Power (CP) model: latents cp ~ Uniform(0,1) and b ~ Uniform(0,1); effect E follows a noisy-OR mechanism — E is true if (C=true and a Bernoulli(cp) event occurs) or a Bernoulli(b) event occurs — with the analytic marginal of E used for likelihood (inner enumeration). MCMC: burn-in 2000, 1000 samples, lag 2. Causal Support (CS) model: same structure, but additionally a latent relation ~ Bernoulli(0.5); when relation is false, C has no effect on E. The CS posterior quantity of interest is the product relation × cp.

model

The CP model infers causal power and background rate from the observed data under the noisy-OR mechanism. The CS model additionally infers whether any causal relationship exists. Both models use the same analytic marginalization of E for efficiency.

query

For each of the 15 data configurations in order: the posterior expected value of cp under the CP model (cpValues) and the posterior expected value of relation × cp under the CS model (csValues). Return these as two parallel arrays.

answer spec record(cpValues, csValues)

{
  "kind": "record",
  "fields": {
    "cpValues": {
      "kind": "value",
      "domain": "realvec",
      "estimated": true
    },
    "csValues": {
      "kind": "value",
      "domain": "realvec",
      "estimated": true
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var generateData = function(numEWithC, numEWithoutC) {
2  var eWithC = repeat(numEWithC, function() {return {C: true, E: true}});
3  var noEWithC = repeat(8 - numEWithC, function() {return {C: true, E: false}});
4  var eWithoutC = repeat(numEWithoutC, function() {return {C: false, E: true}});
5  var noEWithoutC = repeat(8 - numEWithoutC, function() {return {C: false, E: false}});
6  return _.flatten([eWithC, noEWithC, eWithoutC, noEWithoutC]);
7};
8
9var dataParams = [[8, 8], [6, 6], [4, 4], [2, 2], [0, 0], [8, 6],
10                  [6, 4], [4, 2], [2, 0], [8, 4], [6, 2], [4, 0],
11                  [8, 2], [6, 0], [8, 0]];
12
13var data = map(function(x) { generateData(x[0], x[1]) }, dataParams);
14
15var cpPost = function(observedData) {
16  return Infer({method: 'MCMC', burn: 2000, samples: 1000, lag:2}, function() {
17    var cp = uniform(0, 1);
18    var b = uniform(0, 1);
19    var noisyOrMarginal = function(C) {
20      return Infer({method: 'enumerate'}, function() {
21        return (C && flip(cp)) || flip(b);
22      });
23    };
24    mapData({data: observedData}, function(datum) {
25      observe(noisyOrMarginal(datum.C), datum.E);
26    });
27    return cp;
28  });
29};
30
31var csPost = function(observedData) {
32  return Infer({method: 'MCMC', burn: 2000, samples: 1000, lag:2}, function() {
33    var relation = flip();
34    var cp = uniform(0, 1);
35    var b = uniform(0, 1);
36    var noisyOrMarginal = function(C) {
37      return Infer({method: 'enumerate'}, function() {
38        return (relation && C && flip(cp)) || flip(b);
39      });
40    };
41    mapData({data: observedData}, function(datum) {
42      observe(noisyOrMarginal(datum.C), datum.E);
43    });
44    return relation * cp;
45  });
46};
47var ANSWER = (({
48  cpValues: map(function(d) { expectation(cpPost(d)) }, data),
49  csValues: map(function(d) { expectation(csPost(d)) }, data)
50}));

◆realization0.052

python

1# Causal-power (CP) vs causal-support (CS) models over 15 data configurations.
2# Each dataset's continuous posterior is drawn with NUTS (the family WebPPL draws
3# with MCMC).  The noisy-OR marginal P(E=1) = 1 - (1-b)*(1 - [relation*]C*cp) is the
4# enumerate-marginalized inner Infer of the WebPPL model; it is observed as a
5# Bernoulli.  In the CS model the discrete `relation` is marginalized with a
6# logsumexp mixture in pyro.factor so NUTS samples only the continuous cp, b; the
7# queried E[relation*cp] is then recovered by drawing `relation` from its posterior
8# with pyro.infer.infer_discrete, conditioning a config_enumerate model on each
9# NUTS (cp, b) draw (plated over the draws) and observing the same data.
10# NUTS is kept lean (400 samples / 200 warmup) so all 15 datasets x 2 models fit
11# the seed budget.
12
13NUM_SAMPLES = 400
14WARMUP = 200
15
16data_params = [[8, 8], [6, 6], [4, 4], [2, 2], [0, 0], [8, 6],
17               [6, 4], [4, 2], [2, 0], [8, 4], [6, 2], [4, 0],
18               [8, 2], [6, 0], [8, 0]]
19
20
21def make_data(num_e_with_c, num_e_without_c):
22    # 8 trials with C=1, 8 trials with C=0.
23    c = torch.cat([torch.ones(8), torch.zeros(8)])
24    e = torch.cat([
25        torch.ones(num_e_with_c), torch.zeros(8 - num_e_with_c),
26        torch.ones(num_e_without_c), torch.zeros(8 - num_e_without_c),
27    ])
28    return c, e
29
30
31def cp_model(c, e):
32    cp = pyro.sample("cp", dist.Uniform(0.0, 1.0))
33    b = pyro.sample("b", dist.Uniform(0.0, 1.0))
34    # noisy-OR marginal: P(E) = 1 - (1-b)*(1 - C*cp)
35    p_e = (1.0 - (1.0 - b) * (1.0 - c * cp)).clamp(1e-9, 1 - 1e-9)
36    with pyro.plate("data", c.shape[0]):
37        pyro.sample("obs", dist.Bernoulli(p_e), obs=e)
38
39
40def cs_cont_model(c, e):
41    # relation marginalized out of the likelihood (mixture of its two settings),
42    # so the continuous latents cp, b are what NUTS explores.
43    cp = pyro.sample("cp", dist.Uniform(0.0, 1.0))
44    b = pyro.sample("b", dist.Uniform(0.0, 1.0))
45    p_e1 = (1.0 - (1.0 - b) * (1.0 - c * cp)).clamp(1e-9, 1 - 1e-9)  # relation = 1
46    p_e0 = (1.0 - (1.0 - b)).clamp(1e-9, 1 - 1e-9)                    # relation = 0 -> p_e = b
47    ll1 = dist.Bernoulli(p_e1).log_prob(e).sum()
48    ll0 = dist.Bernoulli(p_e0).log_prob(e).sum()
49    # log p(data) marginalizing relation ~ Bernoulli(0.5)
50    log_mix = torch.logsumexp(torch.stack([ll1 + math.log(0.5), ll0 + math.log(0.5)]), dim=0)
51    pyro.factor("obs", log_mix)
52
53
54def cp_expectation(c, e):
55    kernel = pyro.infer.NUTS(cp_model, jit_compile=False)
56    mcmc = pyro.infer.MCMC(kernel, num_samples=NUM_SAMPLES, warmup_steps=WARMUP, disable_progbar=True)
57    mcmc.run(c, e)
58    return mcmc.get_samples()["cp"].mean().item()
59
60
61@pyro.infer.config_enumerate
62def cs_discrete_model(c, e, n_draws):
63    # cp, b are conditioned to the NUTS draws (poutine.condition below); relation
64    # is the only free latent and is enumerated/sampled by infer_discrete.
65    with pyro.plate("draws", n_draws, dim=-2):
66        relation = pyro.sample("relation", dist.Bernoulli(0.5))   # binary causal link
67        cp = pyro.sample("cp", dist.Uniform(0.0, 1.0))            # conditioned -> (n_draws,1)
68        b = pyro.sample("b", dist.Uniform(0.0, 1.0))             # conditioned -> (n_draws,1)
69        with pyro.plate("trials", c.shape[0], dim=-1):
70            p_e = 1.0 - (1.0 - b) * (1.0 - relation * c * cp)
71            p_e = p_e.clamp(1e-9, 1 - 1e-9)
72            pyro.sample("obs", dist.Bernoulli(p_e), obs=e)
73
74
75def cs_expectation(c, e):
76    kernel = pyro.infer.NUTS(cs_cont_model, jit_compile=False)
77    mcmc = pyro.infer.MCMC(kernel, num_samples=NUM_SAMPLES, warmup_steps=WARMUP, disable_progbar=True)
78    mcmc.run(c, e)
79    s = mcmc.get_samples()
80    cp = s["cp"]
81    b = s["b"]
82    n_draws = cp.shape[0]
83    # Recover relation's posterior with Pyro's discrete inference: condition the
84    # enumeration model on each NUTS (cp, b) draw (plated over the draws) and let
85    # infer_discrete sample relation from P(relation | cp, b, data).
86    cond = pyro.poutine.condition(
87        cs_discrete_model,
88        data={"cp": cp.reshape(n_draws, 1), "b": b.reshape(n_draws, 1)},
89    )
90    inferred = pyro.infer.infer_discrete(cond, first_available_dim=-3)
91    trace = pyro.poutine.trace(inferred).get_trace(c, e, n_draws)
92    relation = trace.nodes["relation"]["value"].reshape(n_draws).to(cp.dtype)
93    return (relation * cp).mean().item()
94
95
96cp_values = []
97cs_values = []
98for a, b_ in data_params:
99    c, e = make_data(a, b_)
100    cp_values.append(cp_expectation(c, e))
101    cs_values.append(cs_expectation(c, e))
102
103ANSWER = {"cpValues": cp_values, "csValues": cs_values}
104

02answer overlay — webppl vs pyrorecord(cpValues, csValues)

cpValues

[0.5557, 0.3703, 0.2700, 0.1892, 0.1060, 0.6359, 0.4481, 0.3368, 0.2543, 0.7727, 0.5425, 0.4281, 0.8454, 0.6593, 0.8873]

csValues

[0.3430, 0.1530, 0.1022, 0.0493, 0.0131, 0.5023, 0.2967, 0.1629, 0.1322, 0.7142, 0.4885, 0.3616, 0.8495, 0.6535, 0.8830]

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0718 (record)
solver re-derivation	accept	2/2 solvers · d=[0.047, 0.044] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0515 ≤ tol 0.1592 · floors 0.0796/0.0718

★ feedback on this problem

probmods2-social-cognition / ex1.1

answer dist/finite solver accept pyro pass 0.0000

00 statement source: exercises/social-cognition.md

given

Three actions {a, b, c} and three food outcomes {bagel, cookie, doughnut}, each with prior probability 1/3. The vending machine transition: action a gives bagel with probability 0.8 and each of the others with probability 0.1; action b gives cookie with probability 0.8 and each of the others with probability 0.1; action c gives doughnut with probability 0.8 and each of the others with probability 0.1. Sally is deceptive with probability 0.5.

model

Sally has a goal food drawn from the prior. When not deceptive, she chooses an action with probability proportional to the probability that the action produces her goal food. When deceptive, she chooses an action with probability proportional to the probability that the action does NOT produce her goal food. The observer infers Sally's goal food from observing that she is deceptive and that she chose action b.

query

The posterior distribution over Sally's goal food, given that she is deceptive and chose action b.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "bagel",
    "cookie",
    "doughnut"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var actionPrior = Categorical({vs: ['a', 'b', 'c'], ps: [1/3, 1/3, 1/3]});
2var foodPrior = Categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [1/3, 1/3, 1/3]});
3
4var vendingMachine = function(state, action) {
5  return action == 'a' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.8, .1, .1]}) :
6         action == 'b' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.1, .8, .1]}) :
7         action == 'c' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.1, .1, .8]}) :
8         'nothing';
9};
10
11var chooseAction = function(goal, transition, state, deceive) {
12  return Infer({method: 'enumerate'}, function() {
13    var action = sample(actionPrior);
14    var outcome = transition(state, action);
15    condition(deceive ? !goal(outcome) : goal(outcome));
16    return action;
17  });
18};
19var ANSWER = (Infer({method: 'enumerate'}, function() {
20  var deceive = flip();
21  var goalFood = sample(foodPrior);
22  var goal = function(outcome) {return outcome == goalFood};
23  var sallyActionDist = chooseAction(goal, vendingMachine, 'state', deceive);
24  condition(deceive);
25  condition(sample(sallyActionDist) == 'b');
26  return goalFood;
27}));

◆realization0.000

python

1NEG_INF = torch.tensor(float("-inf"))
2ZERO = torch.tensor(0.0)
3
4actions = ["a", "b", "c"]
5foods = ["bagel", "cookie", "doughnut"]
6
7# Vending machine: P(outcome | action).
8vm_table = torch.tensor([[0.8, 0.1, 0.1], [0.1, 0.8, 0.1], [0.1, 0.1, 0.8]])
9
10
11# Inner chooseAction inference (WebPPL Infer 'enumerate'): action ~ uniform,
12# outcome ~ vendingMachine(action), condition on deceive?!goal:goal. Returns the
13# log-prob marginal over actions, computed by Pyro enumeration.
14def choose_action_logprobs(goal_idx, deceive):
15    @pyro.infer.config_enumerate
16    def m():
17        action = pyro.sample("action", dist.Categorical(torch.ones(len(actions))))
18        outcome = pyro.sample("outcome", dist.Categorical(vm_table[action]))
19        is_goal = outcome == goal_idx
20        cond = (~is_goal) if deceive else is_goal
21        pyro.factor("ev", torch.where(cond, ZERO, NEG_INF))
22        return action
23
24    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(m, lambda: None)
25    return marg["action"].log_prob(torch.arange(len(actions)))
26
27
28# condition(deceive) forces deceive=True; observe Sally chose action 'b'.
29b_idx = actions.index("b")
30lp_b = torch.stack(
31    [choose_action_logprobs(g, True)[b_idx] for g in range(len(foods))]
32)
33
34
35# Observer infers goal food given deceive=True and action b.
36@pyro.infer.config_enumerate
37def model():
38    deceive = pyro.sample("deceive", dist.Bernoulli(0.5)).long()
39    goal = pyro.sample("goal", dist.Categorical(torch.ones(len(foods))))
40    pyro.factor("deceive_ev", torch.where(deceive == 1, ZERO, NEG_INF))
41    pyro.factor("action_ev", lp_b[goal])
42    return goal
43
44
45marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
46d = marg["goal"]
47ANSWER = {foods[i]: float(torch.exp(d.log_prob(torch.tensor(i)))) for i in range(len(foods))}
48

02answer overlay — webppl vs pyrodist/finite

webppl pyro3 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-social-cognition / ex1.2

answer dist/finite solver accept pyro pass 0.0000

00 statement source: exercises/social-cognition.md

given

There are three actions {a, b, c} and three foods {bagel, cookie, doughnut}, each uniformly distributed as priors (probability 1/3 each). Each agent has a goal food (drawn uniformly) and a deceptive disposition (fair coin, probability 0.5). The vending machine maps actions to food outcomes as follows: action a yields bagel with probability 0.8, cookie with 0.1, doughnut with 0.1; action b yields bagel with probability 0.1, cookie with 0.8, doughnut with 0.1; action c yields bagel with probability 0.1, cookie with 0.1, doughnut with 0.8. A non-deceptive agent selects actions whose vending machine outcome matches her goal food; a deceptive agent selects actions whose vending machine outcome does NOT match her goal food. Sally is observed choosing action b on two independent occasions.

model

Each agent has a latent goal food and a latent deceptive/non-deceptive disposition. She selects actions according to a policy over whether the stochastic vending machine outcome matches (non-deceptive) or mismatches (deceptive) her goal, computed by enumerating all three actions. Both observations are independent draws from this same action distribution.

query

The posterior distribution over Sally's goal food, given the two observations.

answer spec dist/finite

{
  "kind": "dist",
  "domain": "finite",
  "support": [
    "bagel",
    "cookie",
    "doughnut"
  ]
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var actionPrior = Categorical({vs: ['a', 'b', 'c'], ps: [1/3, 1/3, 1/3]});
2var foodPrior = Categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [1/3, 1/3, 1/3]});
3
4var vendingMachine = function(state, action) {
5  return action == 'a' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.8, .1, .1]}) :
6         action == 'b' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.1, .8, .1]}) :
7         action == 'c' ? categorical({vs: ['bagel', 'cookie', 'doughnut'], ps: [.1, .1, .8]}) :
8         'nothing';
9};
10
11var chooseAction = function(goal, transition, state, deceive) {
12  return Infer({method: 'enumerate'}, function() {
13    var action = sample(actionPrior);
14    var outcome = transition(state, action);
15    condition(deceive ? !goal(outcome) : goal(outcome));
16    return action;
17  });
18};
19var ANSWER = (Infer({method: 'enumerate'}, function() {
20  var deceive = flip();
21  var goalFood = sample(foodPrior);
22  var goal = function(outcome) {return outcome == goalFood};
23  var sallyActionDist = chooseAction(goal, vendingMachine, 'state', deceive);
24  condition(sample(sallyActionDist) == 'b');
25  condition(sample(sallyActionDist) == 'b');
26  return goalFood;
27}));
28

◆realization0.000

python

1# Sally's-goal inference with a NESTED enumeration over the vending-machine
2# outcome. The inner chooseAction marginal is a separate, completely-finished
3# enumeration (action ~ uniform, outcome ~ vendingMachine(action), condition on
4# deceive ? !goal(outcome) : goal(outcome)) read via compute_marginals over the
5# `action` site. All inner marginals are fully computed and memoized BEFORE the
6# outer enumeration runs, so no inference runs inside another's active
7# enumeration. Inner/outer site names are disjoint (action_in/outcome_in vs
8# deceive/goal_food).
9
10FOODS = ["bagel", "cookie", "doughnut"]
11ACTIONS = ["a", "b", "c"]
12
13# vendingMachine(action) -> categorical over foods, indexed [action, food]
14VEND_PROBS = torch.tensor([
15    [0.8, 0.1, 0.1],  # action a
16    [0.1, 0.8, 0.1],  # action b
17    [0.1, 0.1, 0.8],  # action c
18])
19NEG_INF = torch.tensor(float("-inf"))
20ZERO = torch.tensor(0.0)
21
22
23def marginal_dict(model, site, support):
24    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(
25        model, lambda: None
26    )[site]
27    sup = marg.enumerate_support()
28    probs = marg.log_prob(sup).exp()
29    out = {}
30    for i in range(sup.shape[0]):
31        out[int(sup[i].item())] = float(probs[i].item())
32    return out
33
34
35# ---- Inner: chooseAction marginal over actions, per (goal_idx, deceive) ----
36CA_cache = {}
37
38
39def choose_action_probs(goal_idx, deceive):
40    key = (goal_idx, deceive)
41    if key in CA_cache:
42        return CA_cache[key]
43
44    @pyro.infer.config_enumerate
45    def inner():
46        action = pyro.sample("action_in", dist.Categorical(probs=torch.ones(3) / 3))
47        outcome = pyro.sample("outcome_in", dist.Categorical(probs=VEND_PROBS[action]))
48        achieves = outcome == goal_idx
49        ok = (~achieves) if deceive else achieves
50        pyro.factor("goal_cond", torch.where(ok, ZERO, NEG_INF))
51
52    d = marginal_dict(inner, "action_in", list(range(3)))
53    out = torch.zeros(3)
54    for a, p in d.items():
55        out[a] = p
56    CA_cache[key] = out
57    return out
58
59
60# Pre-warm every inner marginal BEFORE the outer enumeration runs.
61# action_dists[deceive, goal, action]
62action_dists = torch.stack([
63    torch.stack([choose_action_probs(g, bool(dv)) for g in range(3)])
64    for dv in (0, 1)
65])
66
67
68@pyro.infer.config_enumerate
69def model():
70    deceive = pyro.sample("deceive", dist.Bernoulli(0.5)).long()
71    goal_food = pyro.sample("goal_food", dist.Categorical(probs=torch.ones(3) / 3))
72    # Probability Sally takes action 'b' (index 1) under her policy.
73    p_b = action_dists[deceive, goal_food, 1].clamp(min=1e-12)
74    logp = torch.log(p_b)
75    # condition on sampling 'b' from her action distribution, twice.
76    pyro.factor("obs_b_1", logp)
77    pyro.factor("obs_b_2", logp)
78
79
80d = marginal_dict(model, "goal_food", FOODS)
81ANSWER = {FOODS[k]: v for k, v in d.items()}
82

02answer overlay — webppl vs pyrodist/finite

webppl pyro3 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (tv)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-social-cognition / ex2.1

answer record(stay, switch) solver accept pyro pass 0.0000

00 statement source: exercises/social-cognition.md

given

There are three doors {1, 2, 3}. Exactly one door hides a prize; the others are empty. Alice picks a door uniformly at random. Monty then picks a door uniformly at random from all three doors, independently of Alice's choice and the prize location. We observe that Monty's door turns out to be neither Alice's door nor the prize door. If Alice switches, she moves to the one remaining door that is neither her original door nor Monty's door.

model

Alice and the prize are each placed uniformly and independently among the three doors. Monty selects uniformly from all three doors. The joint world is conditioned on Monty's door being different from both Alice's choice and the prize door. Alice's final door is determined by her strategy (stay or switch).

query

Return a record with two fields: `stay` — the posterior distribution over whether Alice wins if she keeps her original door; `switch` — the posterior distribution over whether Alice wins if she switches to the remaining door.

answer spec record(stay, switch)

{
  "kind": "record",
  "fields": {
    "stay": {
      "kind": "dist",
      "domain": "bool"
    },
    "switch": {
      "kind": "dist",
      "domain": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var removeBadItems = function(l, badItems) {
2  return reduce(function(badItem, remainingL) {
3    return remove(badItem, remainingL)
4  }, l, badItems);
5};
6
7var doors = [1, 2, 3];
8
9var montyRandom = function(aliceDoor, prizeDoor) {
10  return Infer({method: 'enumerate'}, function() {
11    return categorical({vs: doors});
12  });
13};
14
15var model = function(switches) {
16  var aliceDoor = categorical({vs: doors});
17  var prizeDoor = categorical({vs: doors});
18  var montyDoorDist = montyRandom(aliceDoor, prizeDoor);
19  var montyDoor = sample(montyDoorDist);
20  condition(montyDoor != prizeDoor);
21  condition(montyDoor != aliceDoor);
22  var aliceDoor = switches ? removeBadItems(doors, [aliceDoor, montyDoor])[0] : aliceDoor;
23  return aliceDoor == prizeDoor;
24};
25var ANSWER = (({
26  stay: Infer({method: 'enumerate'}, function() { return model(false); }),
27  switch: Infer({method: 'enumerate'}, function() { return model(true); })
28}));
29

◆realization0.000

python

1# probmods2-social-cognition/ex2.1
2# Three doors {0,1,2}. Alice and prize placed uniformly & independently; Monty
3# picks uniformly from all three doors. Condition: Monty != prize and Monty !=
4# Alice. stay -> win iff Alice == prize; switch -> Alice moves to the remaining
5# door (3 - Alice - Monty), win iff that == prize. Exact enumeration.
6
7ZERO = torch.tensor(0.0).double()
8NEG_INF = torch.tensor(float("-inf")).double()
9
10def make_model(switches):
11    @pyro.infer.config_enumerate
12    def model():
13        alice = pyro.sample("alice", dist.Categorical(torch.ones(3) / 3.0))
14        prize = pyro.sample("prize", dist.Categorical(torch.ones(3) / 3.0))
15        monty = pyro.sample("monty", dist.Categorical(torch.ones(3) / 3.0))
16        valid = (monty != prize) & (monty != alice)
17        pyro.factor("cond", torch.where(valid, ZERO, NEG_INF))
18        if switches:
19            final = 3 - alice - monty  # remaining door (valid worlds: monty != alice)
20        else:
21            final = alice
22        won = final == prize
23        probs = torch.stack([(~won).double(), won.double()], dim=-1)
24        pyro.sample("won", dist.Categorical(probs))
25    return model
26
27def win_dist(switches):
28    model = make_model(switches)
29    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
30    p = marg["won"].probs.detach()
31    return {False: float(p[0].item()), True: float(p[1].item())}
32
33ANSWER = {
34    "stay": win_dist(False),
35    "switch": win_dist(True),
36}
37

02answer overlay — webppl vs pyrorecord(stay, switch)

stay

webppl pyro2 bins

switch

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (record)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-social-cognition / ex2.2

answer record(stay, switch) solver accept pyro pass 0.0000

00 statement source: exercises/social-cognition.md

given

There are three doors {1, 2, 3}. Exactly one door hides a prize; the others are empty. Alice picks a door uniformly at random. Monty deliberately picks a door uniformly at random from the doors that are neither Alice's door nor the prize door (so Monty always reveals an empty, non-Alice door). If Alice switches, she moves to the one remaining door that is neither her original door nor Monty's door.

model

Alice and the prize are each placed uniformly and independently among the three doors. Monty selects uniformly among doors that avoid both Alice's choice and the prize. Alice's final door is determined by her strategy (stay or switch).

query

Return a record with two fields: `stay` — the posterior distribution over whether Alice wins if she keeps her original door; `switch` — the posterior distribution over whether Alice wins if she switches to the remaining door.

answer spec record(stay, switch)

{
  "kind": "record",
  "fields": {
    "stay": {
      "kind": "dist",
      "domain": "bool"
    },
    "switch": {
      "kind": "dist",
      "domain": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var removeBadItems = function(l, badItems) {
2  return reduce(function(badItem, remainingL) {
3    return remove(badItem, remainingL)
4  }, l, badItems);
5};
6
7var doors = [1, 2, 3];
8
9var montyAvoidBoth = function(aliceDoor, prizeDoor) {
10  return Infer({method: 'enumerate'}, function() {
11    var montyDoor = categorical({vs: doors});
12    condition(montyDoor != aliceDoor);
13    condition(montyDoor != prizeDoor);
14    return montyDoor;
15  });
16};
17
18var model = function(switches) {
19  var aliceDoor = categorical({vs: doors});
20  var prizeDoor = categorical({vs: doors});
21  var montyDoorDist = montyAvoidBoth(aliceDoor, prizeDoor);
22  var montyDoor = sample(montyDoorDist);
23  condition(montyDoor != prizeDoor);
24  condition(montyDoor != aliceDoor);
25  var aliceDoor = switches ? removeBadItems(doors, [aliceDoor, montyDoor])[0] : aliceDoor;
26  return aliceDoor == prizeDoor;
27};
28var ANSWER = (({
29  stay: Infer({method: 'enumerate'}, function() { return model(false); }),
30  switch: Infer({method: 'enumerate'}, function() { return model(true); })
31}));
32

◆realization0.000

python

1# Monty Hall where Monty avoids BOTH Alice's door and the prize door.
2# Exact discrete enumeration through Pyro (config_enumerate + compute_marginals).
3#
4# The crux faithfully translated from the webppl_gt: Monty's door is sampled from
5# the NESTED, NORMALIZED distribution montyAvoidBoth(alice, prize) -- an inner Infer
6# that renormalizes over the valid doors GIVEN (alice, prize).  When alice == prize
7# two doors are valid (each prob 1/2); when alice != prize a single door is valid
8# (prob 1).  Sampling Monty from a flat Categorical + a factor would give the wrong
9# weighting (the bug in the prior attempt: it yielded 1/2 instead of 1/3 for the
10# stay case).  We build Monty as a Categorical whose per-door probabilities are the
11# normalized validity mask for the enumerated (alice, prize) -- i.e. the finished
12# inner distribution fed in as fixed scores -- so the outer conditions are already
13# satisfied and no further factor is needed.  The win indicator (stay / switch) is
14# pinned as a sample site so compute_marginals returns its exact bool marginal.
15
16from pyro.infer import config_enumerate, TraceEnum_ELBO
17
18UNIFORM3 = torch.ones(3) / 3.0
19
20
21def make_model(switches):
22    @config_enumerate
23    def model():
24        alice = pyro.sample("alice", dist.Categorical(UNIFORM3))
25        prize = pyro.sample("prize", dist.Categorical(UNIFORM3))
26
27        # Normalized montyAvoidBoth(alice, prize): per-door validity mask,
28        # renormalized over doors, with the 3-door axis placed LAST so it is the
29        # Categorical event axis.  Stacking the per-door validity tensors along the
30        # last axis broadcasts correctly against whatever enumeration dims alice and
31        # prize carry, without hard-coding their shapes.
32        per_door = [((alice != d) & (prize != d)).double() for d in range(3)]
33        valid = torch.stack(per_door, dim=-1)     # shape: (<enum dims>, 3)
34        monty_probs = valid / valid.sum(dim=-1, keepdim=True)
35        monty = pyro.sample("monty", dist.Categorical(monty_probs))
36
37        if switches:
38            # Alice switches to the remaining door (not hers, not Monty's): 0+1+2=3.
39            new_alice = 3 - alice - monty
40            win = (new_alice == prize)
41        else:
42            win = (alice == prize)
43
44        win_int = win.long()
45        win_probs = torch.nn.functional.one_hot(win_int, 2).double()
46        pyro.sample("win", dist.Categorical(win_probs))
47
48    return model
49
50
51def win_dist(switches):
52    model = make_model(switches)
53    marg = TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(model, lambda: None)
54    w = marg["win"]
55    sup = w.enumerate_support()
56    probs = w.log_prob(sup).exp()
57    out = {}
58    for s, pr in zip(sup, probs):
59        out[bool(int(s.item()))] = float(pr.item())
60    return out
61
62
63ANSWER = {"stay": win_dist(False), "switch": win_dist(True)}
64

02answer overlay — webppl vs pyrorecord(stay, switch)

stay

webppl pyro2 bins

switch

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (record)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-social-cognition / ex2.4

answer record(stay, switch) solver accept pyro pass 0.0000

00 statement source: exercises/social-cognition.md

given

There are three doors {1, 2, 3}. Exactly one door hides a prize; the others are empty. Alice picks a door uniformly at random. Monty picks a door uniformly at random from the doors that are not Alice's door (he may inadvertently reveal the prize). We observe that Monty's door turns out to be neither Alice's door nor the prize door. If Alice switches, she moves to the one remaining door that is neither her original door nor Monty's door.

model

Alice and the prize are each placed uniformly and independently among the three doors. Monty selects uniformly among doors that avoid only Alice's choice, without regard to the prize. The joint world is conditioned on Monty's door turning out to be different from both Alice's and the prize door. Alice's final door is determined by her strategy (stay or switch).

query

Return a record with two fields: `stay` — the posterior distribution over whether Alice wins if she keeps her original door; `switch` — the posterior distribution over whether Alice wins if she switches to the remaining door.

answer spec record(stay, switch)

{
  "kind": "record",
  "fields": {
    "stay": {
      "kind": "dist",
      "domain": "bool"
    },
    "switch": {
      "kind": "dist",
      "domain": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var removeBadItems = function(l, badItems) {
2  return reduce(function(badItem, remainingL) {
3    return remove(badItem, remainingL)
4  }, l, badItems);
5};
6
7var doors = [1, 2, 3];
8
9var montyAvoidAlice = function(aliceDoor, prizeDoor) {
10  return Infer({method: 'enumerate'}, function() {
11    var montyDoor = categorical({vs: doors});
12    condition(montyDoor != aliceDoor);
13    return montyDoor;
14  });
15};
16
17var model = function(switches) {
18  var aliceDoor = categorical({vs: doors});
19  var prizeDoor = categorical({vs: doors});
20  var montyDoorDist = montyAvoidAlice(aliceDoor, prizeDoor);
21  var montyDoor = sample(montyDoorDist);
22  condition(montyDoor != prizeDoor);
23  condition(montyDoor != aliceDoor);
24  var aliceDoor = switches ? removeBadItems(doors, [aliceDoor, montyDoor])[0] : aliceDoor;
25  return aliceDoor == prizeDoor;
26};
27var ANSWER = (({
28  stay: Infer({method: 'enumerate'}, function() { return model(false); }),
29  switch: Infer({method: 'enumerate'}, function() { return model(true); })
30}));
31

◆realization0.000

python

1
2# probmods2-social-cognition/ex2.4
3# Monty Hall. Alice and prize uniform on {1,2,3}. Monty uniform over the 3
4# doors, conditioned to avoid Alice's door (ignoring the prize). Condition on
5# Monty's door != prize and != Alice. Report P(win) for stay and switch via
6# exact enumeration. The win outcome is its own enumerated sample site so the
7# marginal is produced by Pyro inference, not hand computation.
8
9def make_model(switches):
10    @pyro.infer.config_enumerate
11    def model():
12        a = pyro.sample("alice", dist.Categorical(torch.ones(3) / 3))
13        pr = pyro.sample("prize", dist.Categorical(torch.ones(3) / 3))
14        m = pyro.sample("monty", dist.Categorical(torch.ones(3) / 3))
15        # monty != alice  and  monty != prize  (hard conditioning)
16        pyro.factor("monty_avoid_alice",
17                    torch.where(m != a, torch.tensor(0.0), torch.tensor(float("-inf"))))
18        pyro.factor("monty_not_prize",
19                    torch.where(m != pr, torch.tensor(0.0), torch.tensor(float("-inf"))))
20        if switches:
21            # the single door that is neither alice's nor monty's (indices 0,1,2)
22            final = 3 - a - m
23            win = (final == pr).long()
24        else:
25            win = (a == pr).long()
26        # record the win outcome as an enumerated sample site pinned to its value
27        win_probs = torch.nn.functional.one_hot(win, num_classes=2).double()
28        pyro.sample("win", dist.Categorical(win_probs))
29        return win
30
31    return model
32
33def win_dist(switches):
34    model = make_model(switches)
35    marg = pyro.infer.TraceEnum_ELBO(max_plate_nesting=0).compute_marginals(
36        model, lambda: None
37    )
38    w = marg["win"].probs
39    return {True: w[1].item(), False: w[0].item()}
40
41ANSWER = {"stay": win_dist(False), "switch": win_dist(True)}
42

02answer overlay — webppl vs pyrorecord(stay, switch)

stay

webppl pyro2 bins

switch

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (record)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0000 ≤ tol 0.0000 · floors 0.0000/0.0000

★ feedback on this problem

probmods2-social-cognition / ex2.5

answer record(stay, switch) solver accept pyro pass 0.0045

00 statement source: exercises/social-cognition.md

given

There are three doors {1, 2, 3}. Exactly one door hides a prize; the others are empty. Alice picks a door uniformly at random. Monty picks a door uniformly at random from the doors that are not the prize door (he may inadvertently pick Alice's door). We observe that Monty's door turns out to be neither Alice's door nor the prize door. If Alice switches, she moves to the one remaining door that is neither her original door nor Monty's door.

model

Alice and the prize are each placed uniformly and independently among the three doors. Monty selects uniformly among doors that avoid only the prize door, without regard to Alice's choice. The joint world is conditioned on Monty's door turning out to be different from both Alice's and the prize door. Alice's final door is determined by her strategy (stay or switch).

query

Return a record with two fields: `stay` — the posterior distribution over whether Alice wins if she keeps her original door; `switch` — the posterior distribution over whether Alice wins if she switches to the remaining door.

answer spec record(stay, switch)

{
  "kind": "record",
  "fields": {
    "stay": {
      "kind": "dist",
      "domain": "bool"
    },
    "switch": {
      "kind": "dist",
      "domain": "bool"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

webppl primer solver context

(primer loads here)

01 realizations comparing webppl vs pyro

◆ground truth

webppl

1var removeBadItems = function(l, badItems) {
2  return reduce(function(badItem, remainingL) {
3    return remove(badItem, remainingL)
4  }, l, badItems);
5};
6
7var doors = [1, 2, 3];
8
9var montyAvoidPrize = function(aliceDoor, prizeDoor) {
10  return Infer({method: 'enumerate'}, function() {
11    var montyDoor = categorical({vs: doors});
12    condition(montyDoor != prizeDoor);
13    return montyDoor;
14  });
15};
16
17var model = function(switches) {
18  var aliceDoor = categorical({vs: doors});
19  var prizeDoor = categorical({vs: doors});
20  var montyDoorDist = montyAvoidPrize(aliceDoor, prizeDoor);
21  var montyDoor = sample(montyDoorDist);
22  condition(montyDoor != prizeDoor);
23  condition(montyDoor != aliceDoor);
24  var aliceDoor = switches ? removeBadItems(doors, [aliceDoor, montyDoor])[0] : aliceDoor;
25  return aliceDoor == prizeDoor;
26};
27var ANSWER = (({
28  stay: Infer({method: 'enumerate'}, function() { return model(false); }),
29  switch: Infer({method: 'enumerate'}, function() { return model(true); })
30}));
31

◆realization0.004

python

1import pyro.infer
2from pyro.infer import config_enumerate, infer_discrete
3from collections import defaultdict
4
5# Monty Hall variant (Monty avoids only the PRIZE door when picking, then the
6# extra conditions monty != prize and monty != alice are applied). Exact discrete
7# enumeration over the three door latents with config_enumerate; the joint
8# posterior is drawn with infer_discrete and each sampled triple is scored for
9# stay/switch wins, aggregated into the two boolean distributions.
10
11DOORS = [0, 1, 2]
12NEG_INF = torch.tensor(float("-inf"))
13ZERO = torch.tensor(0.0)
14
15
16@config_enumerate
17def model():
18    alice = pyro.sample("alice", dist.Categorical(probs=torch.ones(3) / 3))
19    prize = pyro.sample("prize", dist.Categorical(probs=torch.ones(3) / 3))
20    monty = pyro.sample("monty", dist.Categorical(probs=torch.ones(3) / 3))
21    # montyAvoidPrize selects monty with monty != prize; the outer model then
22    # additionally conditions monty != prize and monty != alice. The net
23    # constraint is monty != prize AND monty != alice.
24    valid = (monty != prize) & (monty != alice)
25    pyro.factor("monty_cond", torch.where(valid, ZERO, NEG_INF))
26
27
28serving = infer_discrete(config_enumerate(model), first_available_dim=-1)
29
30N = 4000
31stay = defaultdict(float)
32switch = defaultdict(float)
33for _ in range(N):
34    tr = pyro.poutine.trace(serving).get_trace()
35    a = int(tr.nodes["alice"]["value"].item())
36    p = int(tr.nodes["prize"]["value"].item())
37    m = int(tr.nodes["monty"]["value"].item())
38    stay[a == p] += 1.0
39    other = [d for d in DOORS if d != a and d != m][0]
40    switch[other == p] += 1.0
41
42stay_total = sum(stay.values())
43switch_total = sum(switch.values())
44ANSWER = {
45    "stay": {True: stay[True] / stay_total, False: stay[False] / stay_total},
46    "switch": {True: switch[True] / switch_total, False: switch[False] / switch_total},
47}
48

02answer overlay — webppl vs pyrorecord(stay, switch)

stay

webppl pyro2 bins

switch

webppl pyro2 bins

03 verification

check	status	evidence
GT self-consistency	ok	floor 0.0000 (record)
solver re-derivation	accept	2/2 solvers · d=[0.000, 0.000] · claude-sonnet-4-6
cross-language (pyro vs webppl)	pass	d=0.0045 ≤ tol 0.0480 · floors 0.0240/0.0000

★ feedback on this problem