posteriordb — ppl-gym

posteriordb-arK / arK

answer record(alpha, beta[1], beta[2], beta[3], beta[4], beta[5], sigma) stan pass 0.0038

00 statement source: posteriordb/arK-arK

given

For a time series of T observations, the data provide an array y of T real-valued observations. The model also requires K, the number of autoregressive lags, which is fixed. The intercept parameter alpha has a Normal(location 0, scale 10) prior. Each of the K autoregressive lag coefficients beta[1] through beta[K] has a Normal(location 0, scale 10) prior. The observation noise standard deviation sigma, constrained to be positive, has a half-Cauchy(location 0, scale 2.5) prior.

model

An autoregressive time series model of order K. The first K observations serve as initial conditions. For each subsequent time point t from K+1 to T, the observation y[t] is generated from a normal distribution whose mean is the intercept alpha plus a linear autoregressive term (the sum of beta[k] times the observation from k steps in the past, y[t-k], for k from 1 to K), and whose standard deviation is sigma.

query

The marginal posterior distributions of the following parameters: alpha (the intercept), beta[1] through beta[K] (the K autoregressive lag coefficients), and sigma (the observation noise standard deviation).

answer spec record(alpha, beta[1], beta[2], beta[3], beta[4], beta[5], sigma)

{
  "kind": "record",
  "fields": {
    "alpha": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.004

stan

1data {
2  int<lower=0> K;
3  int<lower=0> T;
4  array[T] real y;
5}
6parameters {
7  real alpha;
8  array[K] real beta;
9  real<lower=0> sigma;
10}
11model {
12  alpha ~ normal(0, 10);
13  beta ~ normal(0, 10);
14  sigma ~ cauchy(0, 2.5);
15  
16  for (t in (K + 1) : T) {
17    real mu;
18    mu = alpha;
19    
20    for (k in 1 : K) {
21      mu = mu + beta[k] * y[t - k];
22    }
23    
24    y[t] ~ normal(mu, sigma);
25  }
26}
27
28//@ DATA { K: 5, T: 200, y: [200 values] }   // values supplied at runtime
29//@ PARAMS ["alpha","beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","sigma"]
30//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
31

02answer overlay — reference vs stanrecord(alpha, beta[1], beta[2], beta[3], beta[4], beta[5], sigma)

alpha

reference stan24 bins · -0.03 … 0.03

beta[1]

reference stan24 bins · 0.51 … 0.90

beta[2]

reference stan24 bins · 0.17 … 0.72

beta[3]

reference stan24 bins · -0.21 … 0.45

beta[4]

reference stan24 bins · -0.27 … 0.22

beta[5]

reference stan24 bins · -0.50 … -0.10

sigma

reference stan24 bins · 0.12 … 0.18

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0038 ≤ tol 0.0132 · floors 0.0063/0.0066

★ feedback on this problem

posteriordb-bball_drive_event_0 / hmm_drive_0

answer record(theta1[1], theta1[2], theta2[1], theta2[2], phi[1], phi[2], lambda[1], lambda[2]) stan pass 0.0177

00 statement source: posteriordb/bball_drive_event_0-hmm_drive_0

given

For a sequence of N = 416 observations of basketball drive events, the data provide two measurements at each time step: u[t], the inverse player speed (1/speed), and v[t], the distance to the hoop. The model operates with K = 2 hidden states (state 1 represents no drive event, state 2 represents a drive event). A matrix alpha, of size 2 by 2 with positive entries, supplies the Dirichlet hyperparameters for the transition probability priors; each row k specifies the hyperparameters for transitions originating from state k. The parameters theta1 and theta2 are probability vectors that sum to 1 (simplices), each with a Dirichlet prior; theta1 governs transitions from state 1 and theta2 governs transitions from state 2, with hyperparameters from alpha[1, :] and alpha[2, :] respectively. The parameters phi and lambda are pairs of positive values constrained so that phi[1] <= phi[2] and lambda[1] <= lambda[2]; phi[1] has a Normal(0, 1) prior, phi[2] has a Normal(3, 1) prior, lambda[1] has a Normal(0, 1) prior, and lambda[2] has a Normal(3, 1) prior. All four of these priors are over the positive reals (half-normal distributions).

model

The observed sequence is generated by a discrete-time hidden Markov model with K = 2 hidden states. At the initial time step t = 1, the hidden state z[1] is drawn uniformly at random from the two states. At each subsequent time step t = 2, 3, ..., N, the hidden state z[t] transitions from z[t-1] according to a state-specific transition probability vector: if z[t-1] = 1, the next state is drawn from the categorical distribution determined by theta1; if z[t-1] = 2, the next state is drawn from theta2. Given the hidden state z[t] at time t, the two observations u[t] and v[t] are generated independently as exponential random variables with rates phi[z[t]] and lambda[z[t]] respectively.

query

The marginal posterior distributions of the eight parameters: theta1[1] and theta1[2] (the state-conditional transition probabilities from state 1 to states 1 and 2 respectively), theta2[1] and theta2[2] (the state-conditional transition probabilities from state 2 to states 1 and 2 respectively), phi[1] and phi[2] (the exponential rate parameters for inverse speed in states 1 and 2), and lambda[1] and lambda[2] (the exponential rate parameters for hoop distance in states 1 and 2).

answer spec record(theta1[1], theta1[2], theta2[1], theta2[2], phi[1], phi[2], lambda[1], lambda[2])

{
  "kind": "record",
  "fields": {
    "theta1[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta1[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta2[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta2[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "phi[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "phi[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "lambda[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "lambda[2]": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.018

stan

1// drive model (exponential dist)
2data {
3  int<lower=1> K; // number of states (1 = none, 2 = drive)
4  int<lower=1> N; // length of process
5  array[N] real u; // 1/speed
6  array[N] real v; // hoop distance
7  matrix<lower=0>[K, K] alpha; // transit prior
8}
9parameters {
10  simplex[K] theta1;
11  simplex[K] theta2;
12  // enforce an ordering: phi[1] <= phi[2]
13  positive_ordered[K] phi; // emission parameter for 1/speed
14  positive_ordered[K] lambda; // emission parameter for hoop dist
15}
16transformed parameters {
17  array[K] simplex[K] theta; // transit probs
18  theta[1] = theta1;
19  theta[2] = theta2;
20}
21model {
22  // priors
23  for (k in 1 : K) {
24    target += dirichlet_lpdf(theta[k] | alpha[k,  : ]');
25  }
26  target += normal_lpdf(phi[1] | 0, 1);
27  target += normal_lpdf(phi[2] | 3, 1);
28  target += normal_lpdf(lambda[1] | 0, 1);
29  target += normal_lpdf(lambda[2] | 3, 1);
30  // forward algorithm
31  {
32    array[K] real acc;
33    array[N, K] real gamma;
34    for (k in 1 : K) {
35      gamma[1, k] = exponential_lpdf(u[1] | phi[k])
36                    + exponential_lpdf(v[1] | lambda[k]);
37    }
38    for (t in 2 : N) {
39      for (k in 1 : K) {
40        for (j in 1 : K) {
41          acc[j] = gamma[t - 1, j] + log(theta[j, k])
42                   + exponential_lpdf(u[t] | phi[k])
43                   + exponential_lpdf(v[t] | lambda[k]);
44        }
45        gamma[t, k] = log_sum_exp(acc);
46      }
47    }
48    target += log_sum_exp(gamma[N]);
49  }
50}
51generated quantities {
52  array[N] int<lower=1, upper=K> z_star;
53  real log_p_z_star;
54  // Viterbi algorithm
55  {
56    array[N, K] int back_ptr;
57    array[N, K] real best_logp;
58    for (k in 1 : K) {
59      best_logp[1, K] = exponential_lpdf(u[1] | phi[k])
60                        + exponential_lpdf(v[1] | lambda[k]);
61    }
62    for (t in 2 : N) {
63      for (k in 1 : K) {
64        best_logp[t, k] = negative_infinity();
65        for (j in 1 : K) {
66          real logp;
67          logp = best_logp[t - 1, j] + log(theta[j, k])
68                 + exponential_lpdf(u[t] | phi[k])
69                 + exponential_lpdf(v[t] | lambda[k]);
70          if (logp > best_logp[t, k]) {
71            back_ptr[t, k] = j;
72            best_logp[t, k] = logp;
73          }
74        }
75      }
76    }
77    log_p_z_star = max(best_logp[N]);
78    for (k in 1 : K) {
79      if (best_logp[N, k] == log_p_z_star) {
80        z_star[N] = k;
81      }
82    }
83    for (t in 1 : (N - 1)) {
84      z_star[N - t] = back_ptr[N - t + 1, z_star[N - t + 1]];
85    }
86  }
87}
88
89//@ DATA { N: 416, K: 2, u: [416 values], v: [416 values], alpha: [2×2 matrix] }   // values supplied at runtime
90//@ PARAMS ["theta1[1]","theta1[2]","theta2[1]","theta2[2]","phi[1]","phi[2]","lambda[1]","lambda[2]"]
91//@ SAMPLING {"chains":8,"iter_warmup":6000,"iter_sampling":3000,"adapt_delta":0.9}
92

02answer overlay — reference vs stanrecord(theta1[1], theta1[2], theta2[1], theta2[2], phi[1], phi[2], lambda[1], lambda[2])

theta1[1]

reference stan24 bins · 0.96 … 1.00

theta1[2]

reference stan24 bins · 0.00 … 0.04

theta2[1]

reference stan24 bins · 0.00 … 0.13

theta2[2]

reference stan24 bins · 0.87 … 1.00

phi[1]

reference stan24 bins · 1.45 … 2.15

phi[2]

reference stan24 bins · 4.82 … 8.64

lambda[1]

reference stan11 bins · 0.02 … 0.03

lambda[2]

reference stan24 bins · 0.04 … 0.11

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0177 ≤ tol 0.0727 · floors 0.0080/0.0307

★ feedback on this problem

posteriordb-bball_drive_event_1 / hmm_drive_1

answer record(theta1[1], theta1[2], theta2[1], theta2[2], phi[1], phi[2], lambda[1], lambda[2]) stan pass 0.0008

00 statement source: posteriordb/bball_drive_event_1-hmm_drive_1

given

For each of N time steps the data provide two real-valued observations: u[t], a movement measurement (inverse speed, which can be negative), and v[t], the distance to the hoop. The hidden Markov model has K = 2 states (state 1 = no drive, state 2 = drive). Transitions out of each state are governed by a transition probability vector — theta1 out of state 1 and theta2 out of state 2 — each a probability vector over the K states that sums to 1, with a Dirichlet prior whose concentration parameters are supplied as the rows of a K-by-K positive matrix alpha (row k for transitions out of state k). The per-state emission means are phi[1], phi[2] for u and lambda[1], lambda[2] for v; phi and lambda are each ordered so that phi[1] <= phi[2] and lambda[1] <= lambda[2]. The priors are phi[1] ~ Normal(0, 1), phi[2] ~ Normal(3, 1), lambda[1] ~ Normal(0, 1), and lambda[2] ~ Normal(3, 1). The emission standard deviations are fixed and supplied as data: tau for u and rho for v.

model

The observed sequence is generated by a two-state hidden Markov model. The hidden state at the first time step is equally likely to be either state. At each subsequent step the hidden state transitions from the previous one according to that state's transition vector (theta1 if the previous state was 1, theta2 if it was 2). Given the hidden state k at a time step, the two observations are generated independently and normally: u[t] ~ Normal(phi[k], tau) and v[t] ~ Normal(lambda[k], rho).

query

The marginal posterior distribution of each of the eight parameters: theta1[1], theta1[2] (the probabilities of transitioning from state 1 to states 1 and 2), theta2[1], theta2[2] (from state 2 to states 1 and 2), phi[1], phi[2] (the u-emission means for states 1 and 2, with phi[1] <= phi[2]), and lambda[1], lambda[2] (the v-emission means for states 1 and 2, with lambda[1] <= lambda[2]).

answer spec record(theta1[1], theta1[2], theta2[1], theta2[2], phi[1], phi[2], lambda[1], lambda[2])

{
  "kind": "record",
  "fields": {
    "theta1[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta1[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta2[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta2[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "phi[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "phi[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "lambda[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "lambda[2]": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization7.6e-4

stan

1// drive model (normal dist)
2data {
3  int<lower=1> K; // number of states (1 = none, 2 = drive)
4  int<lower=1> N; // length of process
5  array[N] real u; // 1/speed
6  array[N] real v; // hoop distance
7  matrix<lower=0>[K, K] alpha; // transit prior
8  real<lower=0> tau; // sd u
9  real<lower=0> rho; // sd v
10}
11parameters {
12  simplex[K] theta1;
13  simplex[K] theta2;
14  // enforce an ordering: phi[1] <= phi[2]
15  ordered[K] phi; // emission parameter for 1/speed
16  ordered[K] lambda; // emission parameter for hoop dist
17}
18transformed parameters {
19  array[K] simplex[K] theta; // transit probs
20  theta[1] = theta1;
21  theta[2] = theta2;
22}
23model {
24  // priors
25  for (k in 1 : K) {
26    target += dirichlet_lpdf(theta[k] | alpha[k,  : ]');
27  }
28  target += normal_lpdf(phi[1] | 0, 1);
29  target += normal_lpdf(phi[2] | 3, 1);
30  target += normal_lpdf(lambda[1] | 0, 1);
31  target += normal_lpdf(lambda[2] | 3, 1);
32  // forward algorithm
33  {
34    array[K] real acc;
35    array[N, K] real gamma;
36    for (k in 1 : K) {
37      gamma[1, k] = normal_lpdf(u[1] | phi[k], tau)
38                    + normal_lpdf(v[1] | lambda[k], rho);
39    }
40    for (t in 2 : N) {
41      for (k in 1 : K) {
42        for (j in 1 : K) {
43          acc[j] = gamma[t - 1, j] + log(theta[j, k])
44                   + normal_lpdf(u[t] | phi[k], tau)
45                   + normal_lpdf(v[t] | lambda[k], rho);
46        }
47        gamma[t, k] = log_sum_exp(acc);
48      }
49    }
50    target += log_sum_exp(gamma[N]);
51  }
52}
53generated quantities {
54  array[N] int<lower=1, upper=K> z_star;
55  real log_p_z_star;
56  // Viterbi algorithm
57  {
58    array[N, K] int back_ptr;
59    array[N, K] real best_logp;
60    for (k in 1 : K) {
61      best_logp[1, K] = normal_lpdf(u[1] | phi[k], tau)
62                        + normal_lpdf(v[1] | lambda[k], rho);
63    }
64    for (t in 2 : N) {
65      for (k in 1 : K) {
66        best_logp[t, k] = negative_infinity();
67        for (j in 1 : K) {
68          real logp;
69          logp = best_logp[t - 1, j] + log(theta[j, k])
70                 + normal_lpdf(u[t] | phi[k], tau)
71                 + normal_lpdf(v[t] | lambda[k], rho);
72          if (logp > best_logp[t, k]) {
73            back_ptr[t, k] = j;
74            best_logp[t, k] = logp;
75          }
76        }
77      }
78    }
79    log_p_z_star = max(best_logp[N]);
80    for (k in 1 : K) {
81      if (best_logp[N, k] == log_p_z_star) {
82        z_star[N] = k;
83      }
84    }
85    for (t in 1 : (N - 1)) {
86      z_star[N - t] = back_ptr[N - t + 1, z_star[N - t + 1]];
87    }
88  }
89}
90
91//@ DATA { N: 416, K: 2, u: [416 values], v: [416 values], alpha: [2×2 matrix], tau: 0.1, rho: 0.1 }   // values supplied at runtime
92//@ PARAMS ["theta1[1]","theta1[2]","theta2[1]","theta2[2]","phi[1]","phi[2]","lambda[1]","lambda[2]"]
93//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
94

02answer overlay — reference vs stanrecord(theta1[1], theta1[2], theta2[1], theta2[2], phi[1], phi[2], lambda[1], lambda[2])

theta1[1]

reference stan24 bins · 0.86 … 1.00

theta1[2]

reference stan24 bins · 0.00 … 0.14

theta2[1]

reference stan24 bins · 0.00 … 0.03

theta2[2]

reference stan24 bins · 0.97 … 1.00

phi[1]

reference stan24 bins · -2.38 … -2.32

phi[2]

reference stan24 bins · -0.76 … -0.73

lambda[1]

reference stan24 bins · 2.38 … 2.46

lambda[2]

reference stan24 bins · 3.53 … 3.56

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0008 ≤ tol 0.0026 · floors 0.0012/0.0013

★ feedback on this problem

posteriordb-diamonds / diamonds

answer record(b[1], b[2], b[3], b[4], b[5], b[6], b[7], b[8], b[9], b[10], b[11], b[12], b[13], b[14], b[15], b[16], b[17], b[18], b[19], b[20], b[21], b[22], b[23], b[24], Intercept, sigma) stan pass 0.0141

00 statement source: posteriordb/diamonds-diamonds

given

For N observations of diamonds, the data provide a response variable Y containing log-transformed prices and a design matrix X with N rows and K = 25 columns. The first column of X is all 1s (for the intercept). The remaining 24 columns contain numerical and categorical predictor variables. The model centers these 24 predictors by subtracting each column's mean. A binary flag prior_only indicates whether the likelihood should be ignored (prior-only inference). The 24 regression coefficients b[1] through b[24] have independent standard Normal(0, 1) priors. The intercept parameter has a Student-t prior with 3 degrees of freedom, location 8, and scale 10. The residual standard deviation sigma, constrained positive, has a half-Student-t prior with 3 degrees of freedom, location 0, and scale 10 (a Student-t(3, 0, 10) prior truncated to the positive reals).

model

Each observation's log price is normally distributed with a mean equal to the intercept plus a linear combination of the 24 centered predictors weighted by their respective coefficients, and a common standard deviation sigma across all observations. The centering of predictors is performed by subtracting each predictor's column mean from all observations before forming the linear predictor. This results in the intercept representing the expected log price when all centered predictors are at zero, which corresponds to the predictors being at their observed column means.

query

The marginal posterior distributions of the 26 parameters: the 24 centered predictor coefficients b[1], b[2], ..., b[24], the intercept (reported as Intercept), and the residual standard deviation sigma.

answer spec record(b[1], b[2], b[3], b[4], b[5], b[6], b[7], b[8], b[9], b[10], b[11], b[12], b[13], b[14], b[15], b[16], b[17], b[18], b[19], b[20], b[21], b[22], b[23], b[24], Intercept, sigma)

{
  "kind": "record",
  "fields": {
    "b[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[10]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[11]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[12]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[13]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[14]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[15]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[16]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[17]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[18]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[19]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[20]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[21]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[22]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[23]": {
      "kind": "dist",
      "domain": "real"
    },
    "b[24]": {
      "kind": "dist",
      "domain": "real"
    },
    "Intercept": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.014

stan

1// generated with brms 2.10.0
2
3functions {
4  
5}
6data {
7  int<lower=1> N; // number of observations
8  vector[N] Y; // response variable
9  int<lower=1> K; // number of population-level effects
10  matrix[N, K] X; // population-level design matrix
11  int prior_only; // should the likelihood be ignored?
12}
13transformed data {
14  int Kc = K - 1;
15  matrix[N, Kc] Xc; // centered version of X without an intercept
16  vector[Kc] means_X; // column means of X before centering
17  for (i in 2 : K) {
18    means_X[i - 1] = mean(X[ : , i]);
19    Xc[ : , i - 1] = X[ : , i] - means_X[i - 1];
20  }
21}
22parameters {
23  vector[Kc] b; // population-level effects
24  // temporary intercept for centered predictors
25  real Intercept;
26  real<lower=0> sigma; // residual SD
27}
28transformed parameters {
29  
30}
31model {
32  // priors including all constants
33  target += normal_lpdf(b | 0, 1);
34  target += student_t_lpdf(Intercept | 3, 8, 10);
35  target += student_t_lpdf(sigma | 3, 0, 10)
36            - 1 * student_t_lccdf(0 | 3, 0, 10);
37  // likelihood including all constants
38  if (!prior_only) {
39    target += normal_id_glm_lpdf(Y | Xc, Intercept, b, sigma);
40  }
41}
42generated quantities {
43  // actual population-level intercept
44  real b_Intercept = Intercept - dot_product(means_X, b);
45}
46
47//@ DATA { N: 5000, Y: [5000 values], K: 25, X: [5000×25 matrix], prior_only: 0 }   // values supplied at runtime
48//@ PARAMS ["b[1]","b[2]","b[3]","b[4]","b[5]","b[6]","b[7]","b[8]","b[9]","b[10]","b[11]","b[12]","b[13]","b[14]","b[15]","b[16]","b[17]","b[18]","b[19]","b[20]","b[21]","b[22]","b[23]","b[24]","Intercept","sigma"]
49//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
50

02answer overlay — reference vs stanrecord(b[1], b[2], b[3], b[4], b[5], b[6], b[7], b[8], b[9], b[10], b[11], b[12], b[13], b[14], b[15], b[16], b[17], b[18], b[19], b[20], b[21], b[22], b[23], b[24], Intercept, sigma)

parameter	reference mean±sd	stan mean±sd
b[1]	6.64 ± 0.271	6.66 ± 0.235
b[2]	6.35 ± 0.322	6.38 ± 0.304
b[3]	-4.67 ± 0.314	-4.70 ± 0.302
b[4]	1.46 ± 0.149	1.44 ± 0.136
b[5]	0.134 ± 0.008	0.135 ± 0.008
b[6]	-0.040 ± 0.007	-0.041 ± 0.007
b[7]	0.023 ± 0.006	0.023 ± 0.006
b[8]	0.002 ± 0.004	0.002 ± 0.004
b[9]	-0.444 ± 0.006	-0.445 ± 0.006
b[10]	-0.093 ± 0.006	-0.093 ± 0.005
b[11]	-0.012 ± 0.005	-0.013 ± 0.005
b[12]	0.011 ± 0.005	0.011 ± 0.005
b[13]	-0.002 ± 0.005	-0.002 ± 0.004
b[14]	6.02e-4 ± 0.004	9.84e-4 ± 0.004
b[15]	0.900 ± 0.011	0.901 ± 0.011
b[16]	-0.221 ± 0.010	-0.221 ± 0.010
b[17]	0.131 ± 0.009	0.131 ± 0.009
b[18]	-0.058 ± 0.007	-0.057 ± 0.007
b[19]	0.018 ± 0.005	0.018 ± 0.006
b[20]	-0.002 ± 0.005	-0.002 ± 0.005
b[21]	0.032 ± 0.004	0.032 ± 0.004
b[22]	-6.09 ± 0.293	-6.12 ± 0.291
b[23]	4.62 ± 0.290	4.63 ± 0.289
b[24]	-1.45 ± 0.149	-1.43 ± 0.154
Intercept	7.79 ± 0.002	7.79 ± 0.002
sigma	0.123 ± 0.001	0.123 ± 0.001

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0141 ≤ tol 0.0478 · floors 0.0239/0.0236

★ feedback on this problem

posteriordb-earnings / earn_height

answer record(beta[1], beta[2], sigma) stan pass 364.2235

00 statement source: posteriordb/earnings-earn_height

given

For each of N individuals, the data provide the person's earnings and height in inches. The regression has two coefficients, an intercept and a slope on height, each with a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each individual's earnings are normally distributed with a mean equal to the intercept plus the slope times that individual's height, and a common standard deviation sigma across all individuals.

query

The marginal posterior distribution of each of the three parameters: the intercept (reported as beta[1]), the slope on height (reported as beta[2]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization364.224

stan

1data {
2  int<lower=0> N;
3  vector[N] earn;
4  vector[N] height;
5}
6parameters {
7  vector[2] beta;
8  real<lower=0> sigma;
9}
10model {
11  earn ~ normal(beta[1] + beta[2] * height, sigma);
12}
13
14//@ DATA { N: 1192, earn: [1192 values], height: [1192 values] }   // values supplied at runtime
15//@ PARAMS ["beta[1]","beta[2]","sigma"]
16//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
17

02answer overlay — reference vs stanrecord(beta[1], beta[2], sigma)

beta[1]

reference stan24 bins · -88976 … -31502

beta[2]

reference stan24 bins · 800 … 1666

sigma

reference stan24 bins · 17605 … 19923

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=364.2235 ≤ tol 1066.3760 · floors 533.1880/439.1448

★ feedback on this problem

posteriordb-earnings / log10earn_height

answer record(beta[1], beta[2], sigma) stan pass 0.0072

00 statement source: posteriordb/earnings-log10earn_height

given

For each of N individuals, the data provide that individual's earnings (a positive real value in dollars) and height (a real-valued measurement in inches). The model operates on a log base 10 transformation of earnings. The regression has two coefficients, an intercept and a slope on height, each with a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

For each individual, we compute the log base 10 of that individual's earnings. The log base 10 transformed earnings are normally distributed with mean equal to the intercept beta[1] plus the slope beta[2] times that individual's height, and with a common standard deviation sigma across all individuals.

query

The marginal posterior distribution of each of the three parameters: beta[1] (the intercept), beta[2] (the height coefficient), and sigma (the error standard deviation).

answer spec record(beta[1], beta[2], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.007

stan

1data {
2  int<lower=0> N;
3  vector[N] earn;
4  vector[N] height;
5}
6transformed data {
7  // log 10 transformation
8  vector[N] log10_earn;
9  for (i in 1 : N) {
10    log10_earn[i] = log10(earn[i]);
11  }
12}
13parameters {
14  vector[2] beta;
15  real<lower=0> sigma;
16}
17model {
18  log10_earn ~ normal(beta[1] + beta[2] * height, sigma);
19}
20
21//@ DATA { N: 1192, earn: [1192 values], height: [1192 values] }   // values supplied at runtime
22//@ PARAMS ["beta[1]","beta[2]","sigma"]
23//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
24

02answer overlay — reference vs stanrecord(beta[1], beta[2], sigma)

beta[1]

reference stan24 bins · 1.96 … 3.03

beta[2]

reference stan24 bins · 0.02 … 0.03

sigma

reference stan24 bins · 0.37 … 0.41

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0072 ≤ tol 0.0320 · floors 0.0087/0.0092

★ feedback on this problem

posteriordb-earnings / logearn_height

answer record(beta[1], beta[2], sigma) stan pass 0.0168

00 statement source: posteriordb/earnings-logearn_height

given

For each of N = 1192 adults the data provide annual earnings (in dollars, positive) and height (in inches). The model is fit to the natural logarithm of earnings. The regression has two coefficients, an intercept and a slope on height, each with a flat (improper uniform) prior over the real line; the error standard deviation sigma, constrained positive, also has a flat (improper uniform) prior.

model

The natural logarithm of each person's earnings is Normal-distributed with a mean equal to the intercept plus the slope times that person's height, and a common standard deviation sigma across all people.

query

The marginal posterior distribution of each of the three parameters, all on the log-earnings scale: the intercept (reported as beta[1]), the slope on height (reported as beta[2]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.017

stan

1data {
2  int<lower=0> N;
3  vector[N] earn;
4  vector[N] height;
5}
6transformed data {
7  // log transformation
8  vector[N] log_earn;
9  log_earn = log(earn);
10}
11parameters {
12  vector[2] beta;
13  real<lower=0> sigma;
14}
15model {
16  log_earn ~ normal(beta[1] + beta[2] * height, sigma);
17}
18
19//@ DATA { N: 1192, earn: [1192 values], height: [1192 values] }   // values supplied at runtime
20//@ PARAMS ["beta[1]","beta[2]","sigma"]
21//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
22

02answer overlay — reference vs stanrecord(beta[1], beta[2], sigma)

beta[1]

reference stan24 bins · 4.49 … 7.09

beta[2]

reference stan24 bins · 0.04 … 0.08

sigma

reference stan24 bins · 0.83 … 0.95

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0168 ≤ tol 0.0765 · floors 0.0382/0.0189

★ feedback on this problem

posteriordb-earnings / logearn_height_male

answer record(beta[1], beta[2], beta[3], sigma) stan pass 0.0340

00 statement source: posteriordb/earnings-logearn_height_male

given

For each of N = 1192 individuals the data provide earnings in dollars and two predictors: height measured in inches and a binary indicator of male gender. The regression has three coefficients—an intercept, a slope on height, and a slope on the male indicator—each with a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

The natural logarithm of each individual's earnings is normally distributed with a mean equal to the intercept plus the height coefficient times that individual's height plus the male coefficient times that individual's gender indicator, and a common standard deviation sigma across all individuals.

query

The marginal posterior distribution of each of the four parameters: the intercept (reported as beta[1]), the slope on height (reported as beta[2]), the slope on the male indicator (reported as beta[3]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.034

stan

1data {
2  int<lower=0> N;
3  vector[N] earn;
4  vector[N] height;
5  vector[N] male;
6}
7transformed data {
8  // log transformation
9  vector[N] log_earn;
10  log_earn = log(earn);
11}
12parameters {
13  vector[3] beta;
14  real<lower=0> sigma;
15}
16model {
17  log_earn ~ normal(beta[1] + beta[2] * height + beta[3] * male, sigma);
18}
19
20//@ DATA { N: 1192, earn: [1192 values], height: [1192 values], male: [1192 values] }   // values supplied at runtime
21//@ PARAMS ["beta[1]","beta[2]","beta[3]","sigma"]
22//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
23

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], sigma)

beta[1]

reference stan24 bins · 6.45 … 9.63

beta[2]

reference stan24 bins · -0.00 … 0.05

beta[3]

reference stan24 bins · 0.23 … 0.63

sigma

reference stan24 bins · 0.83 … 0.94

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0340 ≤ tol 0.0895 · floors 0.0448/0.0294

★ feedback on this problem

posteriordb-earnings / logearn_interaction

answer record(beta[1], beta[2], beta[3], beta[4], sigma) stan pass 0.0429

00 statement source: posteriordb/earnings-logearn_interaction

given

For each of N individuals, the data provide the individual's earnings (a positive real-valued number), height in inches, and a binary indicator of male gender (1 = male, 0 = female). The model operates on the natural logarithm of earnings. The regression has four coefficients—an intercept, a slope on height, a slope on the male indicator, and a slope on the height-male interaction—each with a flat (improper uniform) prior over the real line. The residual standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each individual's log earnings is normally distributed with a mean equal to the intercept plus the height coefficient times that individual's height, plus the male coefficient times that individual's male indicator, plus the interaction coefficient times the product of height and male indicator. The standard deviation of this normal distribution is sigma, common across all individuals.

query

The marginal posterior distributions of the five parameters: the intercept (beta[1]), the height coefficient (beta[2]), the male indicator coefficient (beta[3]), the height-male interaction coefficient (beta[4]), and the residual standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.043

stan

1data {
2  int<lower=0> N;
3  vector[N] earn;
4  vector[N] height;
5  vector[N] male;
6}
7transformed data {
8  vector[N] log_earn; // log transformation
9  vector[N] inter; // interaction
10  log_earn = log(earn);
11  inter = height .* male;
12}
13parameters {
14  vector[4] beta;
15  real<lower=0> sigma;
16}
17model {
18  log_earn ~ normal(beta[1] + beta[2] * height + beta[3] * male
19                    + beta[4] * inter, sigma);
20}
21
22//@ DATA { N: 1192, earn: [1192 values], height: [1192 values], male: [1192 values] }   // values supplied at runtime
23//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","sigma"]
24//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
25

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], sigma)

beta[1]

reference stan24 bins · 5.57 … 10.6

beta[2]

reference stan24 bins · -0.02 … 0.06

beta[3]

reference stan24 bins · -4.49 … 3.83

beta[4]

reference stan24 bins · -0.05 … 0.07

sigma

reference stan24 bins · 0.83 … 0.94

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0429 ≤ tol 0.2225 · floors 0.0904/0.0758

★ feedback on this problem

posteriordb-earnings / logearn_interaction_z

answer record(beta[1], beta[2], beta[3], beta[4], sigma) stan pass 0.0025

00 statement source: posteriordb/earnings-logearn_interaction_z

given

For each of N = 1192 individuals, the data provide raw earnings (positive reals), height in inches, and a binary male indicator (1 for male, 0 for female). The model operates on log-transformed earnings and standardized height (computed by subtracting the sample mean and dividing by the sample standard deviation of the height data). An interaction term is formed as the product of standardized height and the male indicator. All four regression coefficients beta[1], beta[2], beta[3], beta[4] have a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals (0, infinity).

model

For each individual, the natural logarithm of earnings is normally distributed with a mean equal to the sum of an intercept (beta[1]), a coefficient (beta[2]) times the person's standardized height, a coefficient (beta[3]) times the male indicator, and a coefficient (beta[4]) times the interaction between standardized height and male indicator, with a common standard deviation sigma across all individuals.

query

The marginal posterior distributions of the five parameters: beta[1] (the intercept), beta[2] (the standardized height coefficient), beta[3] (the male indicator coefficient), beta[4] (the interaction coefficient), and sigma (the error standard deviation).

answer spec record(beta[1], beta[2], beta[3], beta[4], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.003

stan

1data {
2  int<lower=0> N;
3  vector[N] earn;
4  vector[N] height;
5  vector[N] male;
6}
7transformed data {
8  vector[N] log_earn; // log transformation
9  vector[N] z_height; // standardization
10  vector[N] inter; // interaction
11  log_earn = log(earn);
12  z_height = (height - mean(height)) / sd(height);
13  inter = z_height .* male;
14}
15parameters {
16  vector[4] beta;
17  real<lower=0> sigma;
18}
19model {
20  log_earn ~ normal(beta[1] + beta[2] * z_height + beta[3] * male
21                    + beta[4] * inter, sigma);
22}
23
24//@ DATA { N: 1192, earn: [1192 values], height: [1192 values], male: [1192 values] }   // values supplied at runtime
25//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","sigma"]
26//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
27

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], sigma)

beta[1]

reference stan24 bins · 9.41 … 9.68

beta[2]

reference stan24 bins · -0.10 … 0.24

beta[3]

reference stan24 bins · 0.22 … 0.66

beta[4]

reference stan24 bins · -0.23 … 0.28

sigma

reference stan24 bins · 0.83 … 0.96

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0025 ≤ tol 0.0119 · floors 0.0043/0.0039

★ feedback on this problem

posteriordb-earnings / logearn_logheight_male

answer record(beta[1], beta[2], beta[3], sigma) stan pass 0.0910

00 statement source: posteriordb/earnings-logearn_logheight_male

given

For each of N = 1192 workers the data provide the worker's earnings (a positive real value), height in inches (a positive real value), and a binary indicator of male gender. The model operates on the natural logarithm of earnings and the natural logarithm of height. The regression has three coefficients: an intercept, a slope on log-transformed height, and a slope on the male indicator, each with a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each worker's log-transformed earnings is Normal-distributed with a mean equal to the intercept plus the slope for log-transformed height times that worker's log-transformed height, plus the slope for male status times that worker's male indicator (0 if female, 1 if male), and a common standard deviation sigma across all workers.

query

The marginal posterior distribution of each of the four parameters: the intercept (reported as beta[1]), the slope on log-transformed height (reported as beta[2]), the slope on the male indicator (reported as beta[3]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.091

stan

1data {
2  int<lower=0> N;
3  vector[N] earn;
4  vector[N] height;
5  vector[N] male;
6}
7transformed data {
8  vector[N] log_earn; // log transformations
9  vector[N] log_height;
10  log_earn = log(earn);
11  log_height = log(height);
12}
13parameters {
14  vector[3] beta;
15  real<lower=0> sigma;
16}
17model {
18  // vectorization
19  log_earn ~ normal(beta[1] + beta[2] * log_height + beta[3] * male, sigma);
20}
21
22//@ DATA { N: 1192, earn: [1192 values], height: [1192 values], male: [1192 values] }   // values supplied at runtime
23//@ PARAMS ["beta[1]","beta[2]","beta[3]","sigma"]
24//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
25

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], sigma)

beta[1]

reference stan24 bins · -3.35 … 10.8

beta[2]

reference stan24 bins · -0.32 … 3.08

beta[3]

reference stan24 bins · 0.20 … 0.63

sigma

reference stan24 bins · 0.84 … 0.95

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0910 ≤ tol 0.4362 · floors 0.2181/0.1353

★ feedback on this problem

posteriordb-eight_schools / eight_schools_noncentered

answer record(theta[1], theta[2], theta[3], theta[4], theta[5], theta[6], theta[7], theta[8], mu, tau) stan pass 0.2568

00 statement source: posteriordb/eight_schools-eight_schools_noncentered

given

Eight schools each ran an SAT-coaching program. For school j (j = 1..8) a separate analysis produced an estimated treatment effect y_j on test scores together with the known standard error sigma_j of that estimate (each sigma_j is positive); the eight estimates and their eight standard errors are provided as data (J = 8). The population mean mu has a Normal(mean 0, sd 5) prior. The population standard deviation tau, constrained to be positive, has a half-Cauchy(location 0, scale 5) prior.

model

Each school has an unknown true coaching effect theta_j, drawn independently from a Normal distribution with mean mu and standard deviation tau. The observed estimate y_j is then Normal-distributed around that true effect theta_j with the school's known standard error sigma_j.

query

The marginal posterior distribution of each parameter given the data: the population mean mu, the population standard deviation tau, and the eight true school effects theta_1, ..., theta_8.

answer spec record(theta[1], theta[2], theta[3], theta[4], theta[5], theta[6], theta[7], theta[8], mu, tau)

{
  "kind": "record",
  "fields": {
    "theta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "mu": {
      "kind": "dist",
      "domain": "real"
    },
    "tau": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.257

stan

1data {
2  int<lower=0> J; // number of schools
3  array[J] real y; // estimated treatment
4  array[J] real<lower=0> sigma; // std of estimated effect
5}
6parameters {
7  vector[J] theta_trans; // transformation of theta
8  real mu; // hyper-parameter of mean
9  real<lower=0> tau; // hyper-parameter of sd
10}
11transformed parameters {
12  vector[J] theta;
13  // original theta
14  theta = theta_trans * tau + mu;
15}
16model {
17  theta_trans ~ normal(0, 1);
18  y ~ normal(theta, sigma);
19  mu ~ normal(0, 5); // a non-informative prior
20  tau ~ cauchy(0, 5);
21}
22
23//@ DATA { J: 8, y: [8 values], sigma: [8 values] }   // values supplied at runtime
24//@ PARAMS ["theta[1]","theta[2]","theta[3]","theta[4]","theta[5]","theta[6]","theta[7]","theta[8]","mu","tau"]
25//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
26

02answer overlay — reference vs stanrecord(theta[1], theta[2], theta[3], theta[4], theta[5], theta[6], theta[7], theta[8], mu, tau)

theta[1]

reference stan24 bins · -13.7 … 33.6

theta[2]

reference stan24 bins · -18 … 25

theta[3]

reference stan24 bins · -21.8 … 23.3

theta[4]

reference stan24 bins · -18.3 … 21.3

theta[5]

reference stan24 bins · -14.7 … 19.5

theta[6]

reference stan24 bins · -17.8 … 23.5

theta[7]

reference stan24 bins · -7.30 … 30.7

theta[8]

reference stan24 bins · -17.2 … 39

mu

reference stan24 bins · -6.41 … 13.8

tau

reference stan24 bins · 0.46 … 21.3

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.2568 ≤ tol 0.8077 · floors 0.3903/0.4039

★ feedback on this problem

posteriordb-garch / garch11

answer record(mu, alpha0, alpha1, beta1) stan pass 0.0185

00 statement source: posteriordb/garch-garch11

given

The data consist of T time series observations y[1], y[2], ..., y[T] of real values, and a positive real initial conditional standard deviation sigma1 at time 1. The model has four parameters, each with a flat improper uniform prior: mu over the real line; alpha0 constrained positive over (0, infinity) with a flat improper uniform prior; alpha1 constrained to the interval [0, 1] with a flat improper uniform prior; and beta1 constrained to [0, 1 - alpha1] with a flat improper uniform prior.

model

This is a GARCH(1,1) model for conditional heteroscedasticity in time series. The conditional standard deviation at time 1 is fixed at the given value sigma1. For each time t from 2 to T, the conditional standard deviation evolves according to the recursion: sigma[t] equals the square root of alpha0 plus alpha1 times the squared deviation of the previous observation from mu, plus beta1 times the square of the previous conditional standard deviation. Each observation y[t] is generated from a normal distribution with mean mu and standard deviation sigma[t].

query

The marginal posterior distributions of the four parameters: mu (the mean), alpha0 (the intercept of the conditional variance), alpha1 (the lagged squared residual coefficient), and beta1 (the lagged conditional variance coefficient).

answer spec record(mu, alpha0, alpha1, beta1)

{
  "kind": "record",
  "fields": {
    "mu": {
      "kind": "dist",
      "domain": "real"
    },
    "alpha0": {
      "kind": "dist",
      "domain": "real"
    },
    "alpha1": {
      "kind": "dist",
      "domain": "real"
    },
    "beta1": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.019

stan

1data {
2  int<lower=0> T;
3  array[T] real y;
4  real<lower=0> sigma1;
5}
6parameters {
7  real mu;
8  real<lower=0> alpha0;
9  real<lower=0, upper=1> alpha1;
10  real<lower=0, upper=(1 - alpha1)> beta1;
11}
12model {
13  array[T] real sigma;
14  sigma[1] = sigma1;
15  for (t in 2 : T) {
16    sigma[t] = sqrt(alpha0 + alpha1 * square(y[t - 1] - mu)
17                    + beta1 * square(sigma[t - 1]));
18  }
19  
20  y ~ normal(mu, sigma);
21}
22
23//@ DATA { T: 200, y: [200 values], sigma1: 0.5 }   // values supplied at runtime
24//@ PARAMS ["mu","alpha0","alpha1","beta1"]
25//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
26

02answer overlay — reference vs stanrecord(mu, alpha0, alpha1, beta1)

mu

reference stan24 bins · 4.64 … 5.43

alpha0

reference stan24 bins · 0.52 … 4.56

alpha1

reference stan24 bins · 0.26 … 0.91

beta1

reference stan24 bins · 0.03 … 0.59

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0185 ≤ tol 0.0708 · floors 0.0197/0.0302

★ feedback on this problem

posteriordb-gp_pois_regr / gp_pois_regr

answer record(rho, alpha, f[1], f[2], f[3], f[4], f[5], f[6], f[7], f[8], f[9], f[10], f[11]) stan pass 0.0370

00 statement source: posteriordb/gp_pois_regr-gp_pois_regr

given

For N = 11 observations, the data provide an input location x_i (a real number) and a count observation k_i (a non-negative integer) for each observation i. The model has two hyperparameters: rho, a length scale parameter constrained to be positive, with a gamma(25, 4) prior; and alpha, a marginal standard deviation parameter constrained to be positive, with a half-normal prior truncated to the positive reals (equivalently, a normal(0, 2) prior restricted to alpha > 0).

model

The observed counts follow a Poisson regression with latent Gaussian process. For each observation i, the count k_i is Poisson-distributed with log-rate parameter f_i, where f is a latent one-dimensional Gaussian process evaluated at the input locations x_1, ..., x_N. The Gaussian process is specified by an exponential-quadratic (squared exponential) covariance kernel with length-scale parameter rho and marginal standard deviation parameter alpha. The latent GP values f_1, ..., f_N are generated from a multivariate normal distribution with mean zero and covariance matrix determined by evaluating the kernel at the observed input locations.

query

The marginal posterior distributions of the two hyperparameters rho (length scale) and alpha (marginal standard deviation), and the latent Gaussian process values f[1], f[2], f[3], f[4], f[5], f[6], f[7], f[8], f[9], f[10], f[11].

answer spec record(rho, alpha, f[1], f[2], f[3], f[4], f[5], f[6], f[7], f[8], f[9], f[10], f[11])

{
  "kind": "record",
  "fields": {
    "rho": {
      "kind": "dist",
      "domain": "real"
    },
    "alpha": {
      "kind": "dist",
      "domain": "real"
    },
    "f[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "f[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "f[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "f[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "f[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "f[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "f[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "f[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "f[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "f[10]": {
      "kind": "dist",
      "domain": "real"
    },
    "f[11]": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.037

stan

1data {
2  int<lower=1> N;
3  array[N] real x;
4  array[N] int k;
5}
6parameters {
7  real<lower=0> rho;
8  real<lower=0> alpha;
9  vector[N] f_tilde;
10}
11transformed parameters {
12  vector[N] f;
13  {
14    matrix[N, N] cov = gp_exp_quad_cov(x, alpha, rho)
15                       + diag_matrix(rep_vector(1e-10, N));
16    matrix[N, N] L_cov = cholesky_decompose(cov);
17    f = L_cov * f_tilde;
18  }
19}
20model {
21  rho ~ gamma(25, 4);
22  alpha ~ normal(0, 2);
23  f_tilde ~ normal(0, 1);
24  
25  k ~ poisson_log(f);
26}
27
28//@ DATA { N: 11, x: [11 values], k: [11 values] }   // values supplied at runtime
29//@ PARAMS ["rho","alpha","f[1]","f[2]","f[3]","f[4]","f[5]","f[6]","f[7]","f[8]","f[9]","f[10]","f[11]"]
30//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
31

02answer overlay — reference vs stanrecord(rho, alpha, f[1], f[2], f[3], f[4], f[5], f[6], f[7], f[8], f[9], f[10], f[11])

parameter	reference mean±sd	stan mean±sd
rho	5.68 ± 0.694	5.61 ± 0.667
alpha	3.00 ± 0.816	2.89 ± 0.736
f[1]	3.64 ± 0.156	3.64 ± 0.154
f[2]	3.68 ± 0.128	3.69 ± 0.128
f[3]	3.25 ± 0.146	3.26 ± 0.143
f[4]	2.41 ± 0.202	2.42 ± 0.189
f[5]	1.57 ± 0.246	1.58 ± 0.240
f[6]	1.30 ± 0.260	1.31 ± 0.268
f[7]	1.94 ± 0.222	1.93 ± 0.236
f[8]	3.18 ± 0.146	3.17 ± 0.155
f[9]	4.25 ± 0.097	4.25 ± 0.098
f[10]	4.42 ± 0.086	4.41 ± 0.089
f[11]	3.51 ± 0.164	3.51 ± 0.164

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0370 ≤ tol 0.1404 · floors 0.0666/0.0702

★ feedback on this problem

posteriordb-gp_pois_regr / gp_regr

answer record(rho, alpha, sigma) stan pass 0.0563

00 statement source: posteriordb/gp_pois_regr-gp_regr

given

The data comprise N = 11 paired observations of input locations x and output values y. Each input x is a real number, and each output y is a real-valued observation. The model is fit to estimate three parameters: rho, a length-scale parameter constrained positive with a gamma(shape 25, rate 4) prior; alpha, the marginal standard deviation of the Gaussian process, constrained positive with a normal(mean 0, standard deviation 2) prior; and sigma, the noise standard deviation, constrained positive with a normal(mean 0, standard deviation 1) prior.

model

The observed outputs y follow a multivariate normal distribution with mean vector 0 and covariance matrix formed as K plus sigma times the identity matrix, where K is the N by N covariance matrix computed using an exponential quadratic (squared exponential) kernel evaluated at the input locations x, with hyperparameters alpha (the amplitude or marginal standard deviation) and rho (the length scale). The observations are thus jointly normal with zero mean and covariance K + sigma*I, where the diagonal elements of the noise contribution are sigma (not sigma squared), corresponding to a noise standard deviation of sigma on each observation.

query

The marginal posterior distributions of the three parameters: rho (the length scale), alpha (the Gaussian process amplitude), and sigma (the noise standard deviation).

answer spec record(rho, alpha, sigma)

{
  "kind": "record",
  "fields": {
    "rho": {
      "kind": "dist",
      "domain": "real"
    },
    "alpha": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.056

stan

1data {
2  int<lower=1> N;
3  array[N] real x;
4  vector[N] y;
5}
6parameters {
7  real<lower=0> rho;
8  real<lower=0> alpha;
9  real<lower=0> sigma;
10}
11model {
12  matrix[N, N] cov = gp_exp_quad_cov(x, alpha, rho)
13                     + diag_matrix(rep_vector(sigma, N));
14  matrix[N, N] L_cov = cholesky_decompose(cov);
15  
16  rho ~ gamma(25, 4);
17  alpha ~ normal(0, 2);
18  sigma ~ normal(0, 1);
19  
20  y ~ multi_normal_cholesky(rep_vector(0, N), L_cov);
21}
22
23//@ DATA { N: 11, x: [11 values], y: [11 values] }   // values supplied at runtime
24//@ PARAMS ["rho","alpha","sigma"]
25//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
26

02answer overlay — reference vs stanrecord(rho, alpha, sigma)

rho

reference stan24 bins · 3.68 … 11.6

alpha

reference stan24 bins · 0.94 … 4.91

sigma

reference stan24 bins · 0.86 … 3.81

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0563 ≤ tol 0.2085 · floors 0.0645/0.0650

★ feedback on this problem

posteriordb-hmm_example / hmm_example

answer record(theta1[1], theta1[2], theta2[1], theta2[2], mu[1], mu[2]) stan pass 0.0092

00 statement source: posteriordb/hmm_example-hmm_example

given

For a sequence of N observations indexed t = 1 to N, with K = 2 hidden states, the data provide an array y of N real-valued observations. The model has two transition probability vectors: theta1, the row of transition probabilities from state 1 (a probability vector summing to 1 with each component non-negative); and theta2, the row of transition probabilities from state 2 (similarly, a probability vector summing to 1). Both theta1 and theta2 have flat (improper uniform) priors. The model also has state-specific emission means mu[1] and mu[2], constrained so that mu[1] <= mu[2] and both positive. These means have Normal priors: mu[1] ~ Normal(3, 1) and mu[2] ~ Normal(10, 1).

model

A Hidden Markov Model generates the sequence of observations. At time t = 1, the hidden state is drawn uniformly at random from the K states (no explicit prior). At each subsequent time t = 2, 3, ..., N, the hidden state transitions according to the transition probabilities: if the state at time t-1 is state j, the state at time t is drawn from the categorical distribution with probabilities given by theta[j] (either theta1 if j=1, or theta2 if j=2). Given the hidden state k at time t, the observation y[t] is drawn from a Normal distribution with mean mu[k] and fixed standard deviation 1. Thus each observation y[t] ~ Normal(mu[state[t]], 1), where state[t] evolves according to the Markov transition probabilities across the entire sequence.

query

The marginal posterior distributions of the five parameters: theta1[1] and theta1[2] (the transition probabilities from state 1), theta2[1] and theta2[2] (the transition probabilities from state 2), and mu[1] and mu[2] (the state-specific observation means, ordered so that mu[1] <= mu[2]).

answer spec record(theta1[1], theta1[2], theta2[1], theta2[2], mu[1], mu[2])

{
  "kind": "record",
  "fields": {
    "theta1[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta1[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta2[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta2[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "mu[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "mu[2]": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.009

stan

1// simple hmm example (1 output; 2 states)
2data {
3  int<lower=0> N;
4  int<lower=0> K;
5  array[N] real y;
6}
7parameters {
8  simplex[K] theta1;
9  simplex[K] theta2;
10  // real mu[K];
11  positive_ordered[K] mu;
12}
13transformed parameters {
14  array[K] simplex[K] theta;
15  theta[1] = theta1;
16  theta[2] = theta2;
17}
18model {
19  // priors
20  target += normal_lpdf(mu[1] | 3, 1);
21  target += normal_lpdf(mu[2] | 10, 1);
22  // forward algorithm
23  {
24    array[K] real acc;
25    array[N, K] real gamma;
26    for (k in 1 : K) {
27      gamma[1, k] = normal_lpdf(y[1] | mu[k], 1);
28    }
29    for (t in 2 : N) {
30      for (k in 1 : K) {
31        for (j in 1 : K) {
32          acc[j] = gamma[t - 1, j] + log(theta[j, k])
33                   + normal_lpdf(y[t] | mu[k], 1);
34        }
35        gamma[t, k] = log_sum_exp(acc);
36      }
37    }
38    target += log_sum_exp(gamma[N]);
39  }
40}
41generated quantities {
42  array[N] int<lower=1, upper=K> z_star;
43  real log_p_z_star;
44  {
45    array[N, K] int back_ptr;
46    array[N, K] real best_logp;
47    for (k in 1 : K) {
48      best_logp[1, k] = normal_lpdf(y[1] | mu[k], 1);
49    }
50    for (t in 2 : N) {
51      for (k in 1 : K) {
52        best_logp[t, k] = negative_infinity();
53        for (j in 1 : K) {
54          real logp;
55          logp = best_logp[t - 1, j] + log(theta[j, k])
56                 + normal_lpdf(y[t] | mu[k], 1);
57          if (logp > best_logp[t, k]) {
58            back_ptr[t, k] = j;
59            best_logp[t, k] = logp;
60          }
61        }
62      }
63    }
64    log_p_z_star = max(best_logp[N]);
65    for (k in 1 : K) {
66      if (best_logp[N, k] == log_p_z_star) {
67        z_star[N] = k;
68      }
69    }
70    for (t in 1 : (N - 1)) {
71      z_star[N - t] = back_ptr[N - t + 1, z_star[N - t + 1]];
72    }
73  }
74}
75
76//@ DATA { N: 100, K: 2, y: [100 values] }   // values supplied at runtime
77//@ PARAMS ["theta1[1]","theta1[2]","theta2[1]","theta2[2]","mu[1]","mu[2]"]
78//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
79

02answer overlay — reference vs stanrecord(theta1[1], theta1[2], theta2[1], theta2[2], mu[1], mu[2])

theta1[1]

reference stan24 bins · 0.28 … 0.91

theta1[2]

reference stan24 bins · 0.09 … 0.72

theta2[1]

reference stan24 bins · 0.01 … 0.20

theta2[2]

reference stan24 bins · 0.80 … 0.99

mu[1]

reference stan24 bins · 2.43 … 3.73

mu[2]

reference stan24 bins · 8.47 … 9.24

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0092 ≤ tol 0.0514 · floors 0.0257/0.0135

★ feedback on this problem

posteriordb-hudson_lynx_hare / lotka_volterra

answer record(theta[1], theta[2], theta[3], theta[4], z_init[1], z_init[2], sigma[1], sigma[2]) stan pass 0.1012

00 statement source: posteriordb/hudson_lynx_hare-lotka_volterra

given

For each of N measurement time points, observations are provided: the time ts[n] and measured populations y[n,1] (prey) and y[n,2] (predator). Initial measured populations y_init[1] (prey) and y_init[2] (predator) are also observed, all positive real numbers. The model includes four positive system parameters theta[1] (prey birth rate), theta[2] (predation rate on prey), theta[3] (predator death rate), and theta[4] (predator efficiency); two positive initial populations z_init[1] (prey) and z_init[2] (predator) at time 0; and two positive measurement error standard deviations sigma[1] (prey) and sigma[2] (predator). The priors are: theta[1] and theta[3] each have a normal distribution with mean 1 and standard deviation 0.5; theta[2] and theta[4] each have a normal distribution with mean 0.05 and standard deviation 0.05; z_init[1] and z_init[2] each have a lognormal distribution with location (log-scale mean) log(10) and scale 1; sigma[1] and sigma[2] each have a lognormal distribution with location -1 and scale 1.

model

The latent prey and predator populations evolve according to the Lotka-Volterra differential equations. Denoting prey as u and predator as v, the system is du/dt = (theta[1] - theta[2]*v)*u and dv/dt = (-theta[3] + theta[4]*u)*v, with initial conditions u(0) = z_init[1] and v(0) = z_init[2]. The system is integrated numerically at each of the N measurement times to yield latent population trajectories. The observed initial populations y_init[1] and y_init[2] are independent lognormal random variables: y_init[k] has distribution lognormal with location log(z_init[k]) and scale sigma[k] for k = 1, 2. At each measurement time n, the observed populations y[n,1] and y[n,2] are independent, with y[n,k] distributed as lognormal with location log of the corresponding latent population and scale sigma[k].

query

The marginal posterior distributions of the eight parameters: theta[1] (prey birth rate), theta[2] (predation rate on prey), theta[3] (predator death rate), theta[4] (predator efficiency), z_init[1] (initial prey population), z_init[2] (initial predator population), sigma[1] (measurement error standard deviation for prey), and sigma[2] (measurement error standard deviation for predator).

answer spec record(theta[1], theta[2], theta[3], theta[4], z_init[1], z_init[2], sigma[1], sigma[2])

{
  "kind": "record",
  "fields": {
    "theta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "z_init[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "z_init[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma[2]": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.101

stan

1functions {
2  array[] real dz_dt(real t, // time
3                     array[] real z,
4                     // system state {prey, predator}
5                     array[] real theta, // parameters
6                     array[] real x_r, // unused data
7                     array[] int x_i) {
8    real u = z[1];
9    real v = z[2];
10    
11    real alpha = theta[1];
12    real beta = theta[2];
13    real gamma = theta[3];
14    real delta = theta[4];
15    
16    real du_dt = (alpha - beta * v) * u;
17    real dv_dt = (-gamma + delta * u) * v;
18    return {du_dt, dv_dt};
19  }
20}
21data {
22  int<lower=0> N; // number of measurement times
23  array[N] real ts; // measurement times > 0
24  array[2] real y_init; // initial measured populations
25  array[N, 2] real<lower=0> y; // measured populations
26}
27parameters {
28  array[4] real<lower=0> theta; // { alpha, beta, gamma, delta }
29  array[2] real<lower=0> z_init; // initial population
30  array[2] real<lower=0> sigma; // measurement errors
31}
32transformed parameters {
33  array[N, 2] real z = integrate_ode_rk45(dz_dt, z_init, 0, ts, theta,
34                                          rep_array(0.0, 0), rep_array(
35                                          0, 0), 1e-5, 1e-3, 5e2);
36}
37model {
38  theta[{1, 3}] ~ normal(1, 0.5);
39  theta[{2, 4}] ~ normal(0.05, 0.05);
40  sigma ~ lognormal(-1, 1);
41  z_init ~ lognormal(log(10), 1);
42  for (k in 1 : 2) {
43    y_init[k] ~ lognormal(log(z_init[k]), sigma[k]);
44    y[ : , k] ~ lognormal(log(z[ : , k]), sigma[k]);
45  }
46}
47generated quantities {
48  array[2] real y_init_rep;
49  array[N, 2] real y_rep;
50  for (k in 1 : 2) {
51    y_init_rep[k] = lognormal_rng(log(z_init[k]), sigma[k]);
52    for (n in 1 : N) {
53      y_rep[n, k] = lognormal_rng(log(z[n, k]), sigma[k]);
54    }
55  }
56}
57
58//@ DATA { N: 20, ts: [20 values], y_init: [2 values], y: [20×2 matrix] }   // values supplied at runtime
59//@ PARAMS ["theta[1]","theta[2]","theta[3]","theta[4]","z_init[1]","z_init[2]","sigma[1]","sigma[2]"]
60//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
61

02answer overlay — reference vs stanrecord(theta[1], theta[2], theta[3], theta[4], z_init[1], z_init[2], sigma[1], sigma[2])

theta[1]

reference stan24 bins · 0.36 … 0.78

theta[2]

reference stan24 bins · 0.02 … 0.04

theta[3]

reference stan24 bins · 0.56 … 1.16

theta[4]

reference stan24 bins · 0.02 … 0.04

z_init[1]

reference stan24 bins · 25.0 … 42.6

z_init[2]

reference stan24 bins · 4.26 … 7.84

sigma[1]

reference stan24 bins · 0.16 … 0.46

sigma[2]

reference stan24 bins · 0.16 … 0.41

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.1012 ≤ tol 0.4363 · floors 0.2181/0.1800

★ feedback on this problem

posteriordb-kidiq / kidscore_interaction

answer record(beta[1], beta[2], beta[3], beta[4], sigma) stan pass 0.5886

00 statement source: posteriordb/kidiq-kidscore_interaction

given

For each of N = 434 children, the data provide the child's cognitive test score (bounded between 0 and 200), a binary indicator of whether the child's mother completed high school (1 = yes, 0 = no), and the child's mother's IQ score (bounded between 0 and 200). The model includes four regression coefficients—an intercept and slopes on the mother's high school completion, the mother's IQ, and the interaction between these two predictors—each with a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a half-Cauchy(location 0, scale 2.5) prior.

model

Each child's cognitive test score is normally distributed with a mean equal to the intercept plus the slope on mother's high school completion times the indicator, plus the slope on mother's IQ times the IQ score, plus the slope on the interaction between these two predictors times their product, and a common standard deviation sigma across all children.

query

The marginal posterior distributions of the five parameters: the intercept (reported as beta[1]), the slope on the mother's high school indicator (reported as beta[2]), the slope on the mother's IQ (reported as beta[3]), the slope on the interaction (reported as beta[4]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.589

stan

1data {
2  int<lower=0> N;
3  vector<lower=0, upper=200>[N] kid_score;
4  vector<lower=0, upper=200>[N] mom_iq;
5  vector<lower=0, upper=1>[N] mom_hs;
6}
7transformed data {
8  // interaction
9  vector[N] inter;
10  inter = mom_hs .* mom_iq;
11}
12parameters {
13  vector[4] beta;
14  real<lower=0> sigma;
15}
16model {
17  sigma ~ cauchy(0, 2.5);
18  kid_score ~ normal(beta[1] + beta[2] * mom_hs + beta[3] * mom_iq
19                     + beta[4] * inter, sigma);
20}
21
22//@ DATA { N: 434, kid_score: [434 values], mom_hs: [434 values], mom_iq: [434 values] }   // values supplied at runtime
23//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","sigma"]
24//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
25

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], sigma)

beta[1]

reference stan24 bins · -45.2 … 32.8

beta[2]

reference stan24 bins · -1.70 … 90.1

beta[3]

reference stan24 bins · 0.49 … 1.37

beta[4]

reference stan24 bins · -0.88 … 0.07

sigma

reference stan24 bins · 16.4 … 19.8

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.5886 ≤ tol 2.9477 · floors 1.0106/1.4738

★ feedback on this problem

posteriordb-kidiq / kidscore_momhs

answer record(beta[1], beta[2], sigma) stan pass 0.0842

00 statement source: posteriordb/kidiq-kidscore_momhs

given

For each of N = 434 children the data provide the child's cognitive test score (between 0 and 200) and a binary indicator of whether the child's mother completed high school (1 = yes, 0 = no). The regression has two coefficients, an intercept and a slope on the high-school indicator, each with a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a half-Cauchy(location 0, scale 2.5) prior.

model

Each child's test score is Normal-distributed with a mean equal to the intercept plus the slope times the mother's high-school indicator, and a common standard deviation sigma across all children.

query

The marginal posterior distribution of each of the three parameters: the intercept (reported as beta[1]), the slope on the mother's high-school indicator (reported as beta[2]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.084

stan

1data {
2  int<lower=0> N;
3  vector<lower=0, upper=200>[N] kid_score;
4  vector<lower=0, upper=1>[N] mom_hs;
5}
6parameters {
7  vector[2] beta;
8  real<lower=0> sigma;
9}
10model {
11  sigma ~ cauchy(0, 2.5);
12  kid_score ~ normal(beta[1] + beta[2] * mom_hs, sigma);
13}
14
15//@ DATA { N: 434, kid_score: [434 values], mom_hs: [434 values] }   // values supplied at runtime
16//@ PARAMS ["beta[1]","beta[2]","sigma"]
17//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
18

02answer overlay — reference vs stanrecord(beta[1], beta[2], sigma)

beta[1]

reference stan24 bins · 70.9 … 83.8

beta[2]

reference stan24 bins · 3.69 … 19.4

sigma

reference stan24 bins · 18 … 22.0

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0842 ≤ tol 0.2948 · floors 0.1474/0.1196

★ feedback on this problem

posteriordb-kidiq / kidscore_momhsiq

answer record(beta[1], beta[2], beta[3], sigma) stan pass 0.2133

00 statement source: posteriordb/kidiq-kidscore_momhsiq

given

For each of N = 434 children, the data provide the child's cognitive test score (between 0 and 200), an indicator of whether the mother completed high school (1 = yes, 0 = no), and the mother's cognitive test score. The regression has three coefficients: an intercept and slopes for the high-school indicator and the mother's score, each with a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a half-Cauchy(location 0, scale 2.5) prior.

model

Each child's test score is Normal-distributed with a mean equal to the intercept plus the mother's high-school indicator coefficient times the indicator, plus the mother's IQ coefficient times the mother's score, and a common standard deviation sigma across all children.

query

The marginal posterior distribution of each of the four parameters: the intercept (reported as beta[1]), the mother's high school completion effect (reported as beta[2]), the mother's IQ effect (reported as beta[3]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.213

stan

1data {
2  int<lower=0> N;
3  vector<lower=0, upper=200>[N] kid_score;
4  vector<lower=0, upper=200>[N] mom_iq;
5  vector<lower=0, upper=1>[N] mom_hs;
6}
7parameters {
8  vector[3] beta;
9  real<lower=0> sigma;
10}
11model {
12  sigma ~ cauchy(0, 2.5);
13  kid_score ~ normal(beta[1] + beta[2] * mom_hs + beta[3] * mom_iq, sigma);
14}
15
16//@ DATA { N: 434, kid_score: [434 values], mom_hs: [434 values], mom_iq: [434 values] }   // values supplied at runtime
17//@ PARAMS ["beta[1]","beta[2]","beta[3]","sigma"]
18//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
19

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], sigma)

beta[1]

reference stan24 bins · 6.90 … 43

beta[2]

reference stan24 bins · 0.11 … 11.5

beta[3]

reference stan24 bins · 0.38 … 0.74

sigma

reference stan24 bins · 15.8 … 19.9

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.2133 ≤ tol 1.2171 · floors 0.6085/0.4829

★ feedback on this problem

posteriordb-kidiq / kidscore_momiq

answer record(beta[1], beta[2], sigma) stan pass 0.3178

00 statement source: posteriordb/kidiq-kidscore_momiq

given

For each of N = 434 children the data provide the child's intelligence quotient (IQ) test score (between 0 and 200) and the child's mother's IQ score (between 0 and 200). The regression has two coefficients, an intercept and a slope on maternal IQ, each with a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a half-Cauchy(location 0, scale 2.5) prior.

model

Each child's test score is Normal-distributed with a mean equal to the intercept plus the slope times the mother's IQ score, and a common standard deviation sigma across all children.

query

The marginal posterior distribution of each of the three parameters: the intercept (reported as beta[1]), the slope on maternal IQ (reported as beta[2]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.318

stan

1data {
2  int<lower=0> N;
3  vector<lower=0, upper=200>[N] kid_score;
4  vector<lower=0, upper=200>[N] mom_iq;
5}
6parameters {
7  vector[2] beta;
8  real<lower=0> sigma;
9}
10model {
11  sigma ~ cauchy(0, 2.5);
12  kid_score ~ normal(beta[1] + beta[2] * mom_iq, sigma);
13}
14
15//@ DATA { N: 434, kid_score: [434 values], mom_iq: [434 values] }   // values supplied at runtime
16//@ PARAMS ["beta[1]","beta[2]","sigma"]
17//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
18

02answer overlay — reference vs stanrecord(beta[1], beta[2], sigma)

beta[1]

reference stan24 bins · 5.70 … 42.5

beta[2]

reference stan24 bins · 0.45 … 0.79

sigma

reference stan24 bins · 16.7 … 20.3

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.3178 ≤ tol 0.7203 · floors 0.3601/0.3502

★ feedback on this problem

posteriordb-kidiq_with_mom_work / kidscore_interaction_c

answer record(beta[1], beta[2], beta[3], beta[4], sigma) stan pass 0.0660

00 statement source: posteriordb/kidiq_with_mom_work-kidscore_interaction_c

given

For each of N = 434 children, the data provide the child's cognitive test score and two parent measures: the mother's high school completion status and the mother's IQ score. To facilitate interpretation, the model centers both the mother's high school measure and the mother's IQ by subtracting their sample means. The regression includes four coefficients: an intercept, a slope for centered mother's high school, a slope for centered mother's IQ, and a slope for their interaction. The regression coefficients beta[1], beta[2], beta[3], and beta[4] each have a flat (improper uniform) prior over the reals. The error standard deviation sigma, constrained to be positive, has a half-Cauchy(location 0, scale 2.5) prior.

model

Each child's test score is normally distributed with a mean equal to the intercept plus the slope for centered mother's high school times that child's centered mother's high school value, plus the slope for centered mother's IQ times that child's centered mother's IQ value, plus the interaction coefficient times the product of the centered mother's high school and centered mother's IQ values. The standard deviation of this normal distribution is sigma, shared across all children.

query

The marginal posterior distribution of each of the five parameters: the intercept (reported as beta[1]), the coefficient for centered mother's high school measure (reported as beta[2]), the coefficient for centered mother's IQ (reported as beta[3]), the coefficient for the interaction between centered mother's high school measure and centered mother's IQ (reported as beta[4]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.066

stan

1data {
2  int<lower=0> N;
3  vector[N] kid_score;
4  vector[N] mom_hs;
5  vector[N] mom_iq;
6}
7transformed data {
8  // centered predictors
9  vector[N] c_mom_hs;
10  vector[N] c_mom_iq;
11  vector[N] inter;
12  c_mom_hs = mom_hs - mean(mom_hs);
13  c_mom_iq = mom_iq - mean(mom_iq);
14  inter = c_mom_hs .* c_mom_iq;
15}
16parameters {
17  vector[4] beta;
18  real<lower=0> sigma;
19}
20model {
21  kid_score ~ normal(beta[1] + beta[2] * c_mom_hs + beta[3] * c_mom_iq
22                     + beta[4] * inter, sigma);
23}
24
25//@ DATA { N: 434, kid_score: [434 values], mom_hs: [434 values], mom_iq: [434 values] }   // values supplied at runtime
26//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","sigma"]
27//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
28

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], sigma)

beta[1]

reference stan24 bins · 84.7 … 90.0

beta[2]

reference stan24 bins · -5.15 … 10.3

beta[3]

reference stan24 bins · 0.42 … 0.79

beta[4]

reference stan24 bins · -1.00 … 0.07

sigma

reference stan24 bins · 16.5 … 20.2

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0660 ≤ tol 0.2432 · floors 0.0958/0.1141

★ feedback on this problem

posteriordb-kidiq_with_mom_work / kidscore_interaction_c2

answer record(beta[1], beta[2], beta[3], beta[4], sigma) stan pass 0.0875

00 statement source: posteriordb/kidiq_with_mom_work-kidscore_interaction_c2

given

For each of N = 434 children, the data provide the child's cognitive test score and two maternal characteristics: whether the child's mother completed high school (1 = yes, 0 = no) and the mother's IQ score. The regression has four coefficients. Two are defined on centered versions of the maternal predictors: maternal high school completion is centered at 0.5, and maternal IQ is centered at 100. The third coefficient is the intercept. The fourth is the slope on the interaction between the two centered predictors. Each of the four coefficients has a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each child's test score is normally distributed with mean equal to the intercept plus a linear combination of the centered maternal predictors and their interaction, with a common standard deviation sigma. Specifically, the centered maternal high school indicator is computed as mom_hs minus 0.5, the centered maternal IQ is computed as mom_iq minus 100, and the interaction is the product of these two centered values. The mean is then beta[1] (intercept) plus beta[2] times the centered high school indicator, plus beta[3] times the centered IQ, plus beta[4] times the interaction, with standard deviation sigma.

query

The marginal posterior distributions of the five parameters: the intercept beta[1], the coefficient on centered maternal high school completion beta[2], the coefficient on centered maternal IQ beta[3], the coefficient on the interaction between centered maternal high school and centered maternal IQ beta[4], and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.088

stan

1data {
2  int<lower=0> N;
3  vector[N] kid_score;
4  vector[N] mom_hs;
5  vector[N] mom_iq;
6}
7transformed data {
8  // centering on reference points
9  vector[N] c2_mom_hs;
10  vector[N] c2_mom_iq;
11  vector[N] inter;
12  c2_mom_hs = mom_hs - 0.5;
13  c2_mom_iq = mom_iq - 100;
14  inter = c2_mom_hs .* c2_mom_iq;
15}
16parameters {
17  vector[4] beta;
18  real<lower=0> sigma;
19}
20model {
21  kid_score ~ normal(beta[1] + beta[2] * c2_mom_hs + beta[3] * c2_mom_iq
22                     + beta[4] * inter, sigma);
23}
24
25//@ DATA { N: 434, kid_score: [434 values], mom_hs: [434 values], mom_iq: [434 values] }   // values supplied at runtime
26//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","sigma"]
27//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
28

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], sigma)

beta[1]

reference stan24 bins · 83.3 … 90.7

beta[2]

reference stan24 bins · -5.36 … 10

beta[3]

reference stan24 bins · 0.44 … 1.01

beta[4]

reference stan24 bins · -0.99 … -0.02

sigma

reference stan24 bins · 16.3 … 20.0

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0875 ≤ tol 0.2602 · floors 0.1082/0.1038

★ feedback on this problem

posteriordb-kidiq_with_mom_work / kidscore_interaction_z

answer record(beta[1], beta[2], beta[3], beta[4], sigma) stan pass 0.1480

00 statement source: posteriordb/kidiq_with_mom_work-kidscore_interaction_z

given

For each of N children, the data provide the child's cognitive test score, a binary indicator of whether the mother completed high school, and the mother's IQ score. The regression has four coefficients: an intercept, a slope on the standardized mother's high school indicator, a slope on the standardized mother's IQ, and a slope on the interaction between these two standardized predictors. Standardization of each predictor is performed by subtracting the sample mean and dividing by twice the sample standard deviation. The four regression coefficients (beta[1], beta[2], beta[3], beta[4]) each have a flat (improper uniform) prior over the reals. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each child's test score is normally distributed with a mean equal to the intercept plus the slope on standardized mother's high school indicator times that indicator, plus the slope on standardized mother's IQ times the mother's IQ, plus the slope on the interaction times the product of the two standardized predictors, and a common standard deviation sigma across all children.

query

The marginal posterior distributions of each of the five parameters: the intercept (reported as beta[1]), the slope on the standardized mother's high school indicator (reported as beta[2]), the slope on the standardized mother's IQ (reported as beta[3]), the slope on the interaction between standardized predictors (reported as beta[4]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.148

stan

1data {
2  int<lower=0> N;
3  vector[N] kid_score;
4  vector[N] mom_hs;
5  vector[N] mom_iq;
6}
7transformed data {
8  // standardizing
9  vector[N] z_mom_hs;
10  vector[N] z_mom_iq;
11  vector[N] inter;
12  z_mom_hs = (mom_hs - mean(mom_hs)) / (2 * sd(mom_hs));
13  z_mom_iq = (mom_iq - mean(mom_iq)) / (2 * sd(mom_iq));
14  inter = z_mom_hs .* z_mom_iq;
15}
16parameters {
17  vector[4] beta;
18  real<lower=0> sigma;
19}
20model {
21  kid_score ~ normal(beta[1] + beta[2] * z_mom_hs + beta[3] * z_mom_iq
22                     + beta[4] * inter, sigma);
23}
24
25//@ DATA { N: 434, kid_score: [434 values], mom_hs: [434 values], mom_iq: [434 values] }   // values supplied at runtime
26//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","sigma"]
27//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
28

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], sigma)

beta[1]

reference stan24 bins · 84.9 … 90.2

beta[2]

reference stan24 bins · -3.75 … 9.08

beta[3]

reference stan24 bins · 12.8 … 23.2

beta[4]

reference stan24 bins · -23.6 … 0.10

sigma

reference stan24 bins · 16.2 … 20.1

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.1480 ≤ tol 0.8464 · floors 0.2620/0.2591

★ feedback on this problem

posteriordb-kidiq_with_mom_work / kidscore_mom_work

answer record(beta[1], beta[2], beta[3], beta[4], sigma) stan pass 0.1269

00 statement source: posteriordb/kidiq_with_mom_work-kidscore_mom_work

given

For each of N = 434 children the data provide the child's cognitive test score and the mother's employment status, encoded as an integer category with four levels. The regression has four coefficients, one for each employment status category, each with a flat (improper uniform) prior over the reals. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each child's test score is normally distributed with a mean determined by the mother's employment status category and a common standard deviation sigma across all children. Specifically, the mean is given by the coefficient corresponding to the child's mother's employment status category.

query

The marginal posterior distribution of each of the five parameters: beta[1], beta[2], beta[3], and beta[4] (the four employment-status-specific mean parameters), and sigma (the error standard deviation).

answer spec record(beta[1], beta[2], beta[3], beta[4], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.127

stan

1data {
2  int<lower=0> N;
3  vector[N] kid_score;
4  array[N] int mom_work;
5}
6transformed data {
7  vector[N] work2;
8  vector[N] work3;
9  vector[N] work4;
10  for (i in 1 : N) {
11    work2[i] = mom_work[i] == 2;
12    work3[i] = mom_work[i] == 3;
13    work4[i] = mom_work[i] == 4;
14  }
15}
16parameters {
17  vector[4] beta;
18  real<lower=0> sigma;
19}
20model {
21  kid_score ~ normal(beta[1] + beta[2] * work2 + beta[3] * work3
22                     + beta[4] * work4, sigma);
23}
24
25//@ DATA { N: 434, kid_score: [434 values], mom_work: [434 values] }   // values supplied at runtime
26//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","sigma"]
27//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
28

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], sigma)

beta[1]

reference stan24 bins · 74.5 … 88.9

beta[2]

reference stan24 bins · -5.61 … 12.8

beta[3]

reference stan24 bins · 1.27 … 23.8

beta[4]

reference stan24 bins · -4.32 … 14.7

sigma

reference stan24 bins · 18.3 … 22.6

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.1269 ≤ tol 0.4941 · floors 0.2195/0.2470

★ feedback on this problem

posteriordb-kilpisjarvi_mod / kilpisjarvi

answer record(alpha, beta, sigma) stan pass 1.4862

00 statement source: posteriordb/kilpisjarvi_mod-kilpisjarvi

given

The data comprise N observations, each consisting of a predictor value x and a response value y. Additionally, the data supply four hyperparameters that specify the prior distributions: pmualpha and psalpha are the mean and standard deviation of the prior for the intercept alpha; pmubeta and psbeta are the mean and standard deviation of the prior for the slope beta. The intercept alpha has a Normal(pmualpha, psalpha) prior. The slope beta has a Normal(pmubeta, psbeta) prior. The error standard deviation sigma, constrained positive, has an improper flat (uniform) prior over the positive reals.

model

Each observation's response value y is Normal-distributed with a mean equal to the intercept alpha plus the slope beta times that observation's predictor value x, and a common standard deviation sigma across all observations.

query

The marginal posterior distribution of each of the three parameters: alpha (the intercept), beta (the slope), and sigma (the error standard deviation).

answer spec record(alpha, beta, sigma)

{
  "kind": "record",
  "fields": {
    "alpha": {
      "kind": "dist",
      "domain": "real"
    },
    "beta": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization1.486

stan

1// Gaussian linear model with adjustable priors
2data {
3  int<lower=0> N; // number of data points
4  vector[N] x; //
5  vector[N] y; //
6  real xpred; // input location for prediction
7  real pmualpha; // prior mean for alpha
8  real psalpha; // prior std for alpha
9  real pmubeta; // prior mean for beta
10  real psbeta; // prior std for beta
11}
12parameters {
13  real alpha;
14  real beta;
15  real<lower=0> sigma;
16}
17model {
18  alpha ~ normal(pmualpha, psalpha);
19  beta ~ normal(pmubeta, psbeta);
20  y ~ normal(alpha + beta * x, sigma);
21}
22
23//@ DATA { N: 62, x: [62 values], y: [62 values], xpred: 2016, pmualpha: 9.31290322580645, psalpha: 100, pmubeta: 0, psbeta: 0.0333333333333333 }   // values supplied at runtime
24//@ PARAMS ["alpha","beta","sigma"]
25//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
26

02answer overlay — reference vs stanrecord(alpha, beta, sigma)

alpha

reference stan24 bins · -144 … 28.1

beta

reference stan24 bins · -0.01 … 0.04

sigma

reference stan24 bins · 0.90 … 1.57

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=1.4862 ≤ tol 5.6980 · floors 2.8490/1.9225

★ feedback on this problem

posteriordb-low_dim_gauss_mix / low_dim_gauss_mix

answer record(mu[1], mu[2], sigma[1], sigma[2], theta) stan pass 0.0015

00 statement source: posteriordb/low_dim_gauss_mix-low_dim_gauss_mix

given

For each of N observations the data provide a single real value y. The model infers a two-component Gaussian mixture in which the two component means are ordered (mu[1] is less than or equal to mu[2]), and each component has its own positive standard deviation. The means mu[1] and mu[2], with the ordering constraint, have a Normal(0, 2) prior. Each standard deviation sigma[1] and sigma[2] has an independent Normal(0, 2) prior, constrained positive. The mixing weight theta, the probability of component 1, has a Beta(5, 5) prior and is constrained to lie in [0, 1].

model

A two-component Gaussian mixture model generates the observed data. Each observation is generated from one of two components: with probability theta it is drawn from component 1 (a Normal distribution with mean mu[1] and standard deviation sigma[1]), and with probability 1 - theta it is drawn from component 2 (a Normal distribution with mean mu[2] and standard deviation sigma[2]). The two component means satisfy an ordering: mu[1] is less than or equal to mu[2]. All observations are independent draws from this mixture distribution.

query

The marginal posterior distributions of the five parameters: mu[1] and mu[2] (the ordered component means), sigma[1] and sigma[2] (the component standard deviations), and theta (the mixing weight for component 1).

answer spec record(mu[1], mu[2], sigma[1], sigma[2], theta)

{
  "kind": "record",
  "fields": {
    "mu[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "mu[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "theta": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.001

stan

1data {
2  int<lower=0> N;
3  vector[N] y;
4}
5parameters {
6  ordered[2] mu;
7  array[2] real<lower=0> sigma;
8  real<lower=0, upper=1> theta;
9}
10model {
11  sigma ~ normal(0, 2);
12  mu ~ normal(0, 2);
13  theta ~ beta(5, 5);
14  for (n in 1 : N) {
15    target += log_mix(theta, normal_lpdf(y[n] | mu[1], sigma[1]),
16                      normal_lpdf(y[n] | mu[2], sigma[2]));
17  }
18}
19
20//@ DATA { N: 1000, y: [1000 values] }   // values supplied at runtime
21//@ PARAMS ["mu[1]","mu[2]","sigma[1]","sigma[2]","theta"]
22//@ SAMPLING {"chains":8,"iter_warmup":6000,"iter_sampling":3000,"adapt_delta":0.8}
23

02answer overlay — reference vs stanrecord(mu[1], mu[2], sigma[1], sigma[2], theta)

mu[1]

reference stan24 bins · -2.86 … -2.60

mu[2]

reference stan24 bins · 2.68 … 3.04

sigma[1]

reference stan24 bins · 0.94 … 1.13

sigma[2]

reference stan24 bins · 0.92 … 1.16

theta

reference stan24 bins · 0.57 … 0.66

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0015 ≤ tol 0.0069 · floors 0.0008/0.0034

★ feedback on this problem

posteriordb-mcycle_gp / accel_gp

answer record(Intercept, sdgp_1, lscale_1, zgp_1[1], zgp_1[2], zgp_1[3], zgp_1[4], zgp_1[5], zgp_1[6], zgp_1[7], zgp_1[8], zgp_1[9], zgp_1[10], zgp_1[11], zgp_1[12], zgp_1[13], zgp_1[14], zgp_1[15], zgp_1[16], zgp_1[17], zgp_1[18], zgp_1[19], zgp_1[20], zgp_1[21], zgp_1[22], zgp_1[23], zgp_1[24], zgp_1[25], zgp_1[26], zgp_1[27], zgp_1[28], zgp_1[29], zgp_1[30], zgp_1[31], zgp_1[32], zgp_1[33], zgp_1[34], zgp_1[35], zgp_1[36], zgp_1[37], zgp_1[38], zgp_1[39], zgp_1[40], Intercept_sigma, sdgp_sigma_1, lscale_sigma_1, zgp_sigma_1[1], zgp_sigma_1[2], zgp_sigma_1[3], zgp_sigma_1[4], zgp_sigma_1[5], zgp_sigma_1[6], zgp_sigma_1[7], zgp_sigma_1[8], zgp_sigma_1[9], zgp_sigma_1[10], zgp_sigma_1[11], zgp_sigma_1[12], zgp_sigma_1[13], zgp_sigma_1[14], zgp_sigma_1[15], zgp_sigma_1[16], zgp_sigma_1[17], zgp_sigma_1[18], zgp_sigma_1[19], zgp_sigma_1[20]) stan pass 0.6570

00 statement source: posteriordb/mcycle_gp-accel_gp

given

For N = 133 observations, the data provide acceleration measurements Y[1], ..., Y[N] at corresponding time locations. The model employs two latent Gaussian processes: one for the mean of acceleration and one for the log-standard deviation. Each GP is approximated via spectral methods using Laplacian eigenfunctions, with 40 basis functions for the mean GP and 20 basis functions for the variance GP. For each GP, the data supply the basis matrix and the corresponding eigenvalue information required for the spectral approximation. Priors: The intercept for the mean has a Student-t prior with degrees of freedom 3, location -13, and scale 36. The mean GP's marginal standard deviation (sdgp_1), constrained positive, has a Student-t(degrees of freedom 3, location 0, scale 36) prior truncated to the positive reals. The mean GP's length-scale (lscale_1), constrained positive, has an inverse-gamma(shape 1.124909, rate 0.0177) prior. The latent coefficients for the mean GP (zgp_1[1] through zgp_1[40]) each have a standard normal(0, 1) prior. The intercept for the log-variance has a Student-t prior with degrees of freedom 3, location 0, and scale 10. The variance GP's marginal standard deviation (sdgp_sigma_1), constrained positive, has a Student-t(degrees of freedom 3, location 0, scale 36) prior truncated to the positive reals. The variance GP's length-scale (lscale_sigma_1), constrained positive, has an inverse-gamma(shape 1.124909, rate 0.0177) prior. The latent coefficients for the variance GP (zgp_sigma_1[1] through zgp_sigma_1[20]) each have a standard normal(0, 1) prior.

model

Each observed acceleration Y[n] is generated from a normal distribution whose mean is the sum of an intercept and a latent smooth function derived from a Gaussian process with an exponential-quadratic covariance kernel, and whose standard deviation is the exponential of a second latent function (an intercept plus another Gaussian process). The two Gaussian processes share the same parametric structure, each defined by a marginal standard deviation parameter, a length-scale parameter controlling smoothness, and a vector of latent coefficients that combine learned basis functions (Laplacian eigenfunctions at the observation time locations) to produce smooth, flexible mean and variance functions.

query

The marginal posterior distributions of all 62 parameters: Intercept (the mean intercept), sdgp_1 (the mean GP marginal standard deviation), lscale_1 (the mean GP length-scale), zgp_1[1] through zgp_1[40] (the 40 latent coefficients for the mean GP), Intercept_sigma (the log-variance intercept), sdgp_sigma_1 (the variance GP marginal standard deviation), lscale_sigma_1 (the variance GP length-scale), and zgp_sigma_1[1] through zgp_sigma_1[20] (the 20 latent coefficients for the variance GP).

answer spec record(Intercept, sdgp_1, lscale_1, zgp_1[1], zgp_1[2], zgp_1[3], zgp_1[4], zgp_1[5], zgp_1[6], zgp_1[7], zgp_1[8], zgp_1[9], zgp_1[10], zgp_1[11], zgp_1[12], zgp_1[13], zgp_1[14], zgp_1[15], zgp_1[16], zgp_1[17], zgp_1[18], zgp_1[19], zgp_1[20], zgp_1[21], zgp_1[22], zgp_1[23], zgp_1[24], zgp_1[25], zgp_1[26], zgp_1[27], zgp_1[28], zgp_1[29], zgp_1[30], zgp_1[31], zgp_1[32], zgp_1[33], zgp_1[34], zgp_1[35], zgp_1[36], zgp_1[37], zgp_1[38], zgp_1[39], zgp_1[40], Intercept_sigma, sdgp_sigma_1, lscale_sigma_1, zgp_sigma_1[1], zgp_sigma_1[2], zgp_sigma_1[3], zgp_sigma_1[4], zgp_sigma_1[5], zgp_sigma_1[6], zgp_sigma_1[7], zgp_sigma_1[8], zgp_sigma_1[9], zgp_sigma_1[10], zgp_sigma_1[11], zgp_sigma_1[12], zgp_sigma_1[13], zgp_sigma_1[14], zgp_sigma_1[15], zgp_sigma_1[16], zgp_sigma_1[17], zgp_sigma_1[18], zgp_sigma_1[19], zgp_sigma_1[20])

{
  "kind": "record",
  "fields": {
    "Intercept": {
      "kind": "dist",
      "domain": "real"
    },
    "sdgp_1": {
      "kind": "dist",
      "domain": "real"
    },
    "lscale_1": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[10]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[11]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[12]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[13]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[14]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[15]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[16]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[17]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[18]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[19]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[20]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[21]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[22]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[23]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[24]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[25]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[26]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[27]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[28]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[29]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[30]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[31]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[32]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[33]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[34]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[35]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[36]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[37]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[38]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[39]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_1[40]": {
      "kind": "dist",
      "domain": "real"
    },
    "Intercept_sigma": {
      "kind": "dist",
      "domain": "real"
    },
    "sdgp_sigma_1": {
      "kind": "dist",
      "domain": "real"
    },
    "lscale_sigma_1": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[10]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[11]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[12]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[13]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[14]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[15]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[16]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[17]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[18]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[19]": {
      "kind": "dist",
      "domain": "real"
    },
    "zgp_sigma_1[20]": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.657

stan

1// generated with brms 2.10.0
2functions {
3  /* compute a latent Gaussian process
4   * Args:
5   *   x: array of continuous predictor values
6   *   sdgp: marginal SD parameter
7   *   lscale: length-scale parameter
8   *   zgp: vector of independent standard normal variables
9   * Returns:
10   *   a vector to be added to the linear predictor
11   */
12  vector gp(array[] vector x, real sdgp, vector lscale, vector zgp) {
13    int Dls = rows(lscale);
14    int N = size(x);
15    matrix[N, N] cov;
16    if (Dls == 1) {
17      // one dimensional or isotropic GP
18      cov = gp_exp_quad_cov(x, sdgp, lscale[1]);
19    } else {
20      // multi-dimensional non-isotropic GP
21      cov = gp_exp_quad_cov(x[ : , 1], sdgp, lscale[1]);
22      for (d in 2 : Dls) {
23        cov = cov .* gp_exp_quad_cov(x[ : , d], 1, lscale[d]);
24      }
25    }
26    for (n in 1 : N) {
27      // deal with numerical non-positive-definiteness
28      cov[n, n] += 1e-12;
29    }
30    return cholesky_decompose(cov) * zgp;
31  }
32  
33  /* Spectral density function of a Gaussian process
34   * Args:
35   *   x: array of numeric values of dimension NB x D
36   *   sdgp: marginal SD parameter
37   *   lscale: vector of length-scale parameters
38   * Returns:
39   *   numeric values of the function evaluated at 'x'
40   */
41  vector spd_cov_exp_quad(array[] vector x, real sdgp, vector lscale) {
42    int NB = dims(x)[1];
43    int D = dims(x)[2];
44    int Dls = rows(lscale);
45    vector[NB] out;
46    if (Dls == 1) {
47      // one dimensional or isotropic GP
48      real constant = square(sdgp) * (sqrt(2 * pi()) * lscale[1]) ^ D;
49      real neg_half_lscale2 = -0.5 * square(lscale[1]);
50      for (m in 1 : NB) {
51        out[m] = constant * exp(neg_half_lscale2 * dot_self(x[m]));
52      }
53    } else {
54      // multi-dimensional non-isotropic GP
55      real constant = square(sdgp) * sqrt(2 * pi()) ^ D * prod(lscale);
56      vector[Dls] neg_half_lscale2 = -0.5 * square(lscale);
57      for (m in 1 : NB) {
58        out[m] = constant * exp(dot_product(neg_half_lscale2, square(x[m])));
59      }
60    }
61    return out;
62  }
63  /* compute an approximate latent Gaussian process
64   * Args:
65   *   X: Matrix of Laplacian eigen functions at the covariate values
66   *   sdgp: marginal SD parameter
67   *   lscale: vector of length-scale parameters
68   *   zgp: vector of independent standard normal variables
69   *   slambda: square root of the Laplacian eigen values
70   * Returns:
71   *   a vector to be added to the linear predictor
72   */
73  vector gpa(matrix X, real sdgp, vector lscale, vector zgp,
74             array[] vector slambda) {
75    vector[cols(X)] diag_spd = sqrt(spd_cov_exp_quad(slambda, sdgp, lscale));
76    return X * (diag_spd .* zgp);
77  }
78}
79data {
80  int<lower=1> N; // number of observations
81  vector[N] Y; // response variable
82  // data related to GPs
83  // number of sub-GPs (equal to 1 unless 'by' was used)
84  int<lower=1> Kgp_1;
85  int<lower=1> Dgp_1; // GP dimension
86  // number of basis functions of an approximate GP
87  int<lower=1> NBgp_1;
88  // approximate GP basis matrices
89  matrix[N, NBgp_1] Xgp_1;
90  // approximate GP eigenvalues
91  array[NBgp_1] vector[Dgp_1] slambda_1;
92  // data related to GPs
93  // number of sub-GPs (equal to 1 unless 'by' was used)
94  int<lower=1> Kgp_sigma_1;
95  int<lower=1> Dgp_sigma_1; // GP dimension
96  // number of basis functions of an approximate GP
97  int<lower=1> NBgp_sigma_1;
98  // approximate GP basis matrices
99  matrix[N, NBgp_sigma_1] Xgp_sigma_1;
100  // approximate GP eigenvalues
101  array[NBgp_sigma_1] vector[Dgp_sigma_1] slambda_sigma_1;
102  int prior_only; // should the likelihood be ignored?
103}
104transformed data {
105  
106}
107parameters {
108  // temporary intercept for centered predictors
109  real Intercept;
110  // GP standard deviation parameters
111  real<lower=0> sdgp_1;
112  // GP length-scale parameters
113  real<lower=0> lscale_1;
114  // latent variables of the GP
115  vector[NBgp_1] zgp_1;
116  // temporary intercept for centered predictors
117  real Intercept_sigma;
118  // GP standard deviation parameters
119  real<lower=0> sdgp_sigma_1;
120  // GP length-scale parameters
121  real<lower=0> lscale_sigma_1;
122  // latent variables of the GP
123  vector[NBgp_sigma_1] zgp_sigma_1;
124}
125transformed parameters {
126  // vector versions of real parameters
127  vector<lower=0>[Kgp_1] vsdgp_1;
128  array[Kgp_1] vector<lower=0>[1] vlscale_1;
129  vector<lower=0>[Kgp_sigma_1] vsdgp_sigma_1;
130  array[Kgp_sigma_1] vector<lower=0>[1] vlscale_sigma_1;
131  vsdgp_1[1] = sdgp_1;
132  vlscale_1[1, 1] = lscale_1;
133  vsdgp_sigma_1[1] = sdgp_sigma_1;
134  vlscale_sigma_1[1, 1] = lscale_sigma_1;
135}
136model {
137  // initialize linear predictor term
138  vector[N] mu = Intercept + rep_vector(0, N)
139                 + gpa(Xgp_1, vsdgp_1[1], vlscale_1[1], zgp_1, slambda_1);
140  // initialize linear predictor term
141  vector[N] sigma = Intercept_sigma + rep_vector(0, N)
142                    + gpa(Xgp_sigma_1, vsdgp_sigma_1[1], vlscale_sigma_1[1],
143                          zgp_sigma_1, slambda_sigma_1);
144  for (n in 1 : N) {
145    // apply the inverse link function
146    sigma[n] = exp(sigma[n]);
147  }
148  // priors including all constants
149  target += student_t_lpdf(Intercept | 3, -13, 36);
150  target += student_t_lpdf(vsdgp_1 | 3, 0, 36)
151            - 1 * student_t_lccdf(0 | 3, 0, 36);
152  target += normal_lpdf(zgp_1 | 0, 1);
153  target += inv_gamma_lpdf(vlscale_1[1] | 1.124909, 0.0177);
154  target += student_t_lpdf(Intercept_sigma | 3, 0, 10);
155  target += student_t_lpdf(vsdgp_sigma_1 | 3, 0, 36)
156            - 1 * student_t_lccdf(0 | 3, 0, 36);
157  target += normal_lpdf(zgp_sigma_1 | 0, 1);
158  target += inv_gamma_lpdf(vlscale_sigma_1[1] | 1.124909, 0.0177);
159  // likelihood including all constants
160  if (!prior_only) {
161    target += normal_lpdf(Y | mu, sigma);
162  }
163}
164generated quantities {
165  // actual population-level intercept
166  real b_Intercept = Intercept;
167  // actual population-level intercept
168  real b_sigma_Intercept = Intercept_sigma;
169}
170
171//@ DATA { N: 133, Y: [133 values], Dgp_1: 1, NBgp_1: 40, Kgp_1: 1, Xgp_1: [133×40 matrix], slambda_1: [40×1 matrix], Dgp_sigma_1: 1, NBgp_sigma_1: 20, Kgp_sigma_1: 1, Xgp_sigma_1: [133×20 matrix], slambda_sigma_1: [20×1 matrix], prior_only: 0 }   // values supplied at runtime
172//@ PARAMS ["Intercept","sdgp_1","lscale_1","zgp_1[1]","zgp_1[2]","zgp_1[3]","zgp_1[4]","zgp_1[5]","zgp_1[6]","zgp_1[7]","zgp_1[8]","zgp_1[9]","zgp_1[10]","zgp_1[11]","zgp_1[12]","zgp_1[13]","zgp_1[14]","zgp_1[15]","zgp_1[16]","zgp_1[17]","zgp_1[18]","zgp_1[19]","zgp_1[20]","zgp_1[21]","zgp_1[22]","zgp_1[23]","zgp_1[24]","zgp_1[25]","zgp_1[26]","zgp_1[27]","zgp_1[28]","zgp_1[29]","zgp_1[30]","zgp_1[31]","zgp_1[32]","zgp_1[33]","zgp_1[34]","zgp_1[35]","zgp_1[36]","zgp_1[37]","zgp_1[38]","zgp_1[39]","zgp_1[40]","Intercept_sigma","sdgp_sigma_1","lscale_sigma_1","zgp_sigma_1[1]","zgp_sigma_1[2]","zgp_sigma_1[3]","zgp_sigma_1[4]","zgp_sigma_1[5]","zgp_sigma_1[6]","zgp_sigma_1[7]","zgp_sigma_1[8]","zgp_sigma_1[9]","zgp_sigma_1[10]","zgp_sigma_1[11]","zgp_sigma_1[12]","zgp_sigma_1[13]","zgp_sigma_1[14]","zgp_sigma_1[15]","zgp_sigma_1[16]","zgp_sigma_1[17]","zgp_sigma_1[18]","zgp_sigma_1[19]","zgp_sigma_1[20]"]
173//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
174

02answer overlay — reference vs stanrecord(Intercept, sdgp_1, lscale_1, zgp_1[1], zgp_1[2], zgp_1[3], zgp_1[4], zgp_1[5], zgp_1[6], zgp_1[7], zgp_1[8], zgp_1[9], zgp_1[10], zgp_1[11], zgp_1[12], zgp_1[13], zgp_1[14], zgp_1[15], zgp_1[16], zgp_1[17], zgp_1[18], zgp_1[19], zgp_1[20], zgp_1[21], zgp_1[22], zgp_1[23], zgp_1[24], zgp_1[25], zgp_1[26], zgp_1[27], zgp_1[28], zgp_1[29], zgp_1[30], zgp_1[31], zgp_1[32], zgp_1[33], zgp_1[34], zgp_1[35], zgp_1[36], zgp_1[37], zgp_1[38], zgp_1[39], zgp_1[40], Intercept_sigma, sdgp_sigma_1, lscale_sigma_1, zgp_sigma_1[1], zgp_sigma_1[2], zgp_sigma_1[3], zgp_sigma_1[4], zgp_sigma_1[5], zgp_sigma_1[6], zgp_sigma_1[7], zgp_sigma_1[8], zgp_sigma_1[9], zgp_sigma_1[10], zgp_sigma_1[11], zgp_sigma_1[12], zgp_sigma_1[13], zgp_sigma_1[14], zgp_sigma_1[15], zgp_sigma_1[16], zgp_sigma_1[17], zgp_sigma_1[18], zgp_sigma_1[19], zgp_sigma_1[20])

parameter	reference mean±sd	stan mean±sd
Intercept	-10.68 ± 15.99	-11.37 ± 15.86
sdgp_1	43.62 ± 11.58	42.95 ± 10.30
lscale_1	0.081 ± 0.016	0.079 ± 0.016
zgp_1[1]	0.019 ± 0.944	-0.021 ± 0.953
zgp_1[2]	-0.251 ± 0.858	-0.288 ± 0.884
zgp_1[3]	0.400 ± 0.877	0.395 ± 0.937
zgp_1[4]	0.382 ± 0.752	0.382 ± 0.709
zgp_1[5]	-0.783 ± 0.860	-0.723 ± 0.729
zgp_1[6]	-0.473 ± 0.841	-0.453 ± 0.762
zgp_1[7]	0.704 ± 0.877	0.719 ± 0.763
zgp_1[8]	0.670 ± 0.771	0.662 ± 0.771
zgp_1[9]	-0.453 ± 0.777	-0.516 ± 0.769
zgp_1[10]	-0.917 ± 0.793	-0.933 ± 0.863
zgp_1[11]	0.485 ± 0.782	0.441 ± 0.851
zgp_1[12]	1.01 ± 0.868	0.993 ± 0.771
zgp_1[13]	-0.570 ± 0.835	-0.540 ± 0.809
zgp_1[14]	-0.878 ± 0.829	-0.879 ± 0.789
zgp_1[15]	0.509 ± 0.808	0.564 ± 0.826
zgp_1[16]	0.754 ± 0.801	0.791 ± 0.868
zgp_1[17]	-0.366 ± 0.826	-0.357 ± 0.772
zgp_1[18]	-0.842 ± 0.875	-0.793 ± 0.913
zgp_1[19]	0.287 ± 0.799	0.222 ± 0.812
zgp_1[20]	1.06 ± 0.800	1.05 ± 0.776
zgp_1[21]	-0.098 ± 0.816	-0.138 ± 0.739
zgp_1[22]	-1.08 ± 0.829	-1.05 ± 0.827
zgp_1[23]	0.077 ± 0.780	0.076 ± 0.805
zgp_1[24]	0.839 ± 0.830	0.831 ± 0.838
zgp_1[25]	-0.016 ± 0.814	-0.086 ± 0.824
zgp_1[26]	-0.604 ± 0.875	-0.598 ± 0.785
zgp_1[27]	0.021 ± 0.814	0.026 ± 0.838
zgp_1[28]	0.366 ± 0.867	0.385 ± 0.847
zgp_1[29]	-0.017 ± 0.823	3.26e-4 ± 0.894
zgp_1[30]	-0.325 ± 0.858	-0.231 ± 0.864
zgp_1[31]	-0.196 ± 0.846	-0.143 ± 0.920
zgp_1[32]	0.207 ± 0.836	0.102 ± 0.845
zgp_1[33]	0.284 ± 0.873	0.318 ± 0.884
zgp_1[34]	0.036 ± 0.849	0.030 ± 0.845
zgp_1[35]	-0.646 ± 0.925	-0.578 ± 0.867
zgp_1[36]	-0.213 ± 0.848	-0.148 ± 0.906
zgp_1[37]	0.609 ± 0.884	0.629 ± 0.812
zgp_1[38]	0.288 ± 0.882	0.250 ± 0.883
zgp_1[39]	-0.479 ± 0.905	-0.423 ± 0.878
zgp_1[40]	-0.386 ± 0.920	-0.416 ± 0.867
Intercept_sigma	2.40 ± 0.636	2.43 ± 0.555
sdgp_sigma_1	2.99 ± 1.57	2.84 ± 1.44
lscale_sigma_1	0.031 ± 0.037	0.029 ± 0.032
zgp_sigma_1[1]	0.080 ± 1.02	0.056 ± 0.940
zgp_sigma_1[2]	-0.513 ± 0.881	-0.479 ± 0.946
zgp_sigma_1[3]	-0.322 ± 0.949	-0.345 ± 0.969
zgp_sigma_1[4]	0.825 ± 0.800	0.874 ± 0.768
zgp_sigma_1[5]	0.655 ± 0.843	0.630 ± 0.859
zgp_sigma_1[6]	-0.880 ± 0.875	-0.865 ± 0.802
zgp_sigma_1[7]	-0.756 ± 0.843	-0.767 ± 0.835
zgp_sigma_1[8]	0.430 ± 0.831	0.493 ± 0.869
zgp_sigma_1[9]	0.673 ± 0.871	0.753 ± 0.832
zgp_sigma_1[10]	0.064 ± 0.855	0.084 ± 0.891
zgp_sigma_1[11]	-0.501 ± 0.801	-0.442 ± 0.764
zgp_sigma_1[12]	-0.504 ± 0.795	-0.498 ± 0.847
zgp_sigma_1[13]	0.128 ± 0.823	0.185 ± 0.869
zgp_sigma_1[14]	0.573 ± 0.841	0.536 ± 0.865
zgp_sigma_1[15]	0.223 ± 0.817	0.274 ± 0.757
zgp_sigma_1[16]	-0.312 ± 0.768	-0.339 ± 0.831
zgp_sigma_1[17]	-0.684 ± 0.800	-0.640 ± 0.762
zgp_sigma_1[18]	0.064 ± 0.814	0.013 ± 0.750
zgp_sigma_1[19]	0.671 ± 0.571	0.675 ± 0.620
zgp_sigma_1[20]	0.407 ± 0.628	0.339 ± 0.567

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.6570 ≤ tol 2.6900 · floors 0.7507/1.3450

★ feedback on this problem

posteriordb-mesquite / logmesquite

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], sigma) stan pass 0.0131

00 statement source: posteriordb/mesquite-logmesquite

given

For each of N = 46 mesquite trees, the data provide the tree's weight, two diameter measurements (diam1 and diam2), canopy height, total height, wood density, and a binary group indicator. All continuous measurements (weight, diam1, diam2, canopy height, total height, and density) are log-transformed before fitting. The model has seven regression coefficients (an intercept and six slopes), each with a flat (improper uniform) prior over the reals. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each tree's log-transformed weight is normally distributed around a linear predictor. The linear predictor is the sum of an intercept, plus a slope coefficient times each of six covariates: the log-transformed first diameter, log-transformed second diameter, log-transformed canopy height, log-transformed total height, log-transformed wood density, and the untransformed group indicator. The standard deviation of the normal distribution is sigma, common to all observations.

query

The marginal posterior distribution of each of the eight parameters: the intercept (reported as beta[1]), the slope on log-diameter 1 (reported as beta[2]), the slope on log-diameter 2 (reported as beta[3]), the slope on log-canopy height (reported as beta[4]), the slope on log-total height (reported as beta[5]), the slope on log-density (reported as beta[6]), the slope on group (reported as beta[7]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.013

stan

1data {
2  int<lower=0> N;
3  vector[N] weight;
4  vector[N] diam1;
5  vector[N] diam2;
6  vector[N] canopy_height;
7  vector[N] total_height;
8  vector[N] density;
9  vector[N] group;
10}
11transformed data {
12  // log transformations
13  vector[N] log_weight;
14  vector[N] log_diam1;
15  vector[N] log_diam2;
16  vector[N] log_canopy_height;
17  vector[N] log_total_height;
18  vector[N] log_density;
19  log_weight = log(weight);
20  log_diam1 = log(diam1);
21  log_diam2 = log(diam2);
22  log_canopy_height = log(canopy_height);
23  log_total_height = log(total_height);
24  log_density = log(density);
25}
26parameters {
27  vector[7] beta;
28  real<lower=0> sigma;
29}
30model {
31  log_weight ~ normal(beta[1] + beta[2] * log_diam1 + beta[3] * log_diam2
32                      + beta[4] * log_canopy_height
33                      + beta[5] * log_total_height + beta[6] * log_density
34                      + beta[7] * group, sigma);
35}
36
37//@ DATA { N: 46, canopy_height: [46 values], density: [46 values], diam1: [46 values], diam2: [46 values], group: [46 values], total_height: [46 values], weight: [46 values] }   // values supplied at runtime
38//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","sigma"]
39//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
40

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], sigma)

beta[1]

reference stan24 bins · 4.81 … 5.83

beta[2]

reference stan24 bins · -0.56 … 1.46

beta[3]

reference stan24 bins · 0.15 … 1.79

beta[4]

reference stan24 bins · -0.43 … 1.58

beta[5]

reference stan24 bins · -0.72 … 1.48

beta[6]

reference stan24 bins · -0.33 … 0.48

beta[7]

reference stan24 bins · -1.18 … -0.17

sigma

reference stan24 bins · 0.25 … 0.48

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0131 ≤ tol 0.0453 · floors 0.0203/0.0226

★ feedback on this problem

posteriordb-mesquite / logmesquite_logva

answer record(beta[1], beta[2], beta[3], beta[4], sigma) stan pass 0.0103

00 statement source: posteriordb/mesquite-logmesquite_logva

given

For each of N = 46 plants, the data provide the plant's dry weight in kilograms, two diameter measurements in meters (diam1 and diam2), canopy height in meters, and a binary group indicator (0 or 1). The model operates on log-transformed variables: log-weight is the natural logarithm of weight; log-canopy-volume is the logarithm of diam1 times diam2 times canopy-height; log-canopy-area is the logarithm of diam1 times diam2. The regression has four coefficients—an intercept, a slope on log-canopy-volume, a slope on log-canopy-area, and a slope on the group indicator—each with a flat (improper uniform) prior over the reals. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each plant's log-transformed weight is normally distributed with a mean equal to the intercept plus the slope coefficient on log-canopy-volume times the log-transformed canopy volume, plus the slope coefficient on log-canopy-area times the log-transformed canopy area, plus the slope coefficient on the group indicator times the group membership value, and a common standard deviation sigma across all plants.

query

The marginal posterior distributions of the five parameters: the intercept (reported as beta[1]), the slope on log-canopy-volume (reported as beta[2]), the slope on log-canopy-area (reported as beta[3]), the slope on the group indicator (reported as beta[4]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.010

stan

1data {
2  int<lower=0> N;
3  vector[N] weight;
4  vector[N] diam1;
5  vector[N] diam2;
6  vector[N] canopy_height;
7  vector[N] group;
8}
9transformed data {
10  vector[N] log_weight;
11  vector[N] log_canopy_volume;
12  vector[N] log_canopy_area;
13  log_weight = log(weight);
14  log_canopy_volume = log(diam1 .* diam2 .* canopy_height);
15  log_canopy_area = log(diam1 .* diam2);
16}
17parameters {
18  vector[4] beta;
19  real<lower=0> sigma;
20}
21model {
22  log_weight ~ normal(beta[1] + beta[2] * log_canopy_volume
23                      + beta[3] * log_canopy_area + beta[4] * group, sigma);
24}
25
26//@ DATA { N: 46, canopy_height: [46 values], diam1: [46 values], diam2: [46 values], group: [46 values], weight: [46 values] }   // values supplied at runtime
27//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","sigma"]
28//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
29

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], sigma)

beta[1]

reference stan24 bins · 4.89 … 5.56

beta[2]

reference stan24 bins · 0.10 … 1.46

beta[3]

reference stan24 bins · -0.71 … 0.99

beta[4]

reference stan24 bins · -1.00 … -0.18

sigma

reference stan24 bins · 0.25 … 0.47

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0103 ≤ tol 0.0395 · floors 0.0115/0.0184

★ feedback on this problem

posteriordb-mesquite / logmesquite_logvas

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], sigma) stan pass 0.0136

00 statement source: posteriordb/mesquite-logmesquite_logvas

given

For each of N observations, the data provide the tree's weight and seven predictor measurements: two canopy diameters (diam1 and diam2) perpendicular to each other, canopy height, total height, wood density, and a grouping indicator. All predictors are derived by log-transforming combinations of the raw measurements: log-canopy-volume is the log of the product diam1 times diam2 times canopy_height; log-canopy-area is the log of diam1 times diam2; log-canopy-shape is the log of diam1 divided by diam2; log-total-height is the log of total_height; log-density is the log of density; and group is used directly without transformation. The model operates on log-weight, the natural logarithm of the tree weight. The regression has seven coefficients—an intercept beta[1] and six slopes beta[2] through beta[7] corresponding to the six transformed predictors (log-canopy-volume, log-canopy-area, log-canopy-shape, log-total-height, log-density, and group)—each with a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each tree's log-weight is normally distributed with a mean equal to a linear combination of the seven predictors and a common standard deviation sigma across all observations. The linear predictor is the intercept beta[1] plus beta[2] times log-canopy-volume, plus beta[3] times log-canopy-area, plus beta[4] times log-canopy-shape, plus beta[5] times log-total-height, plus beta[6] times log-density, plus beta[7] times the group indicator. The log-weight is thus generated from a normal distribution with this computed mean and standard deviation sigma.

query

The marginal posterior distributions of the eight parameters: the intercept beta[1], the six slope coefficients beta[2], beta[3], beta[4], beta[5], beta[6], and beta[7] (corresponding to log-canopy-volume, log-canopy-area, log-canopy-shape, log-total-height, log-density, and group respectively), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.014

stan

1data {
2  int<lower=0> N;
3  vector[N] weight;
4  vector[N] diam1;
5  vector[N] diam2;
6  vector[N] canopy_height;
7  vector[N] total_height;
8  vector[N] density;
9  vector[N] group;
10}
11transformed data {
12  vector[N] log_weight;
13  vector[N] log_canopy_volume;
14  vector[N] log_canopy_area;
15  vector[N] log_canopy_shape;
16  vector[N] log_total_height;
17  vector[N] log_density;
18  log_weight = log(weight);
19  log_canopy_volume = log(diam1 .* diam2 .* canopy_height);
20  log_canopy_area = log(diam1 .* diam2);
21  log_canopy_shape = log(diam1 ./ diam2);
22  log_total_height = log(total_height);
23  log_density = log(density);
24}
25parameters {
26  vector[7] beta;
27  real<lower=0> sigma;
28}
29model {
30  log_weight ~ normal(beta[1] + beta[2] * log_canopy_volume
31                      + beta[3] * log_canopy_area
32                      + beta[4] * log_canopy_shape
33                      + beta[5] * log_total_height + beta[6] * log_density
34                      + beta[7] * group, sigma);
35}
36
37//@ DATA { N: 46, canopy_height: [46 values], density: [46 values], diam1: [46 values], diam2: [46 values], group: [46 values], total_height: [46 values], weight: [46 values] }   // values supplied at runtime
38//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","sigma"]
39//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
40

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], sigma)

beta[1]

reference stan24 bins · 4.73 … 5.94

beta[2]

reference stan24 bins · -0.60 … 1.25

beta[3]

reference stan24 bins · -0.53 … 1.37

beta[4]

reference stan24 bins · -1.14 … 0.40

beta[5]

reference stan24 bins · -0.85 … 1.24

beta[6]

reference stan24 bins · -0.35 … 0.58

beta[7]

reference stan24 bins · -1.02 … -0.18

sigma

reference stan24 bins · 0.25 … 0.52

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0136 ≤ tol 0.0523 · floors 0.0152/0.0212

★ feedback on this problem

posteriordb-mesquite / logmesquite_logvash

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], sigma) stan pass 0.0137

00 statement source: posteriordb/mesquite-logmesquite_logvash

given

For each of N = 46 plants, the data provide measurements of the plant's weight (in kg), two diameter measurements diam1 and diam2, the height of the canopy, the total height of the plant, and a grouping indicator. The weight is transformed to the log scale to construct the response variable. The analysis uses six derived predictor variables computed from the raw measurements: log(diam1 times diam2 times canopy_height), log(diam1 times diam2), log(diam1 divided by diam2), log(total_height), and the grouping indicator itself (untransformed). All six regression coefficients beta[1] through beta[6] have flat (improper uniform) priors over the real line. The residual standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each plant's log-transformed weight is normally distributed around a linear predictor constructed from the five log-transformed geometric covariates and the grouping indicator. The linear predictor is the sum of an intercept beta[1] plus coefficients beta[2] through beta[6] multiplied by the corresponding derived predictors: beta[2] times log(diam1 times diam2 times canopy_height) plus beta[3] times log(diam1 times diam2) plus beta[4] times log(diam1 divided by diam2) plus beta[5] times log(total_height) plus beta[6] times the grouping indicator. The standard deviation of the normal distribution for each observation is the common parameter sigma across all plants.

query

The marginal posterior distributions of each of the seven parameters: beta[1] (the intercept), beta[2] (the coefficient for log canopy volume), beta[3] (the coefficient for log canopy area), beta[4] (the coefficient for log canopy shape), beta[5] (the coefficient for log total height), beta[6] (the coefficient for the grouping indicator), and sigma (the residual standard deviation).

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.014

stan

1data {
2  int<lower=0> N;
3  vector[N] weight;
4  vector[N] diam1;
5  vector[N] diam2;
6  vector[N] canopy_height;
7  vector[N] total_height;
8  vector[N] group;
9}
10transformed data {
11  vector[N] log_weight;
12  vector[N] log_canopy_volume;
13  vector[N] log_canopy_area;
14  vector[N] log_canopy_shape;
15  vector[N] log_total_height;
16  log_weight = log(weight);
17  log_canopy_volume = log(diam1 .* diam2 .* canopy_height);
18  log_canopy_area = log(diam1 .* diam2);
19  log_canopy_shape = log(diam1 ./ diam2);
20  log_total_height = log(total_height);
21}
22parameters {
23  vector[6] beta;
24  real<lower=0> sigma;
25}
26model {
27  log_weight ~ normal(beta[1] + beta[2] * log_canopy_volume
28                      + beta[3] * log_canopy_area
29                      + beta[4] * log_canopy_shape
30                      + beta[5] * log_total_height + beta[6] * group, sigma);
31}
32
33//@ DATA { N: 46, canopy_height: [46 values], diam1: [46 values], diam2: [46 values], group: [46 values], total_height: [46 values], weight: [46 values] }   // values supplied at runtime
34//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","sigma"]
35//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
36

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], sigma)

beta[1]

reference stan24 bins · 4.78 … 5.92

beta[2]

reference stan24 bins · -0.59 … 1.25

beta[3]

reference stan24 bins · -0.44 … 1.42

beta[4]

reference stan24 bins · -1.17 … 0.36

beta[5]

reference stan24 bins · -0.51 … 1.42

beta[6]

reference stan24 bins · -0.93 … -0.09

sigma

reference stan24 bins · 0.25 … 0.47

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0137 ≤ tol 0.0554 · floors 0.0215/0.0277

★ feedback on this problem

posteriordb-mesquite / logmesquite_logvolume

answer record(beta[1], beta[2], sigma) stan pass 0.0030

00 statement source: posteriordb/mesquite-logmesquite_logvolume

given

For each of N mesquite trees, the data provide the tree's weight (positive real value), two diameter measurements (diam1 and diam2, both positive reals), and canopy height (positive real). The model operates on log-transformed variables: the natural logarithm of weight as the response, and the natural logarithm of canopy volume (computed as the product diam1 * diam2 * canopy_height) as the predictor. The two regression coefficients beta[1] and beta[2] each have a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each tree's log-weight is normally distributed with a mean equal to the intercept plus the slope times the tree's log-canopy-volume, and a common standard deviation sigma across all trees.

query

The marginal posterior distribution of each of the three parameters: the intercept (reported as beta[1]), the slope on log-canopy-volume (reported as beta[2]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.003

stan

1data {
2  int<lower=0> N;
3  vector[N] weight;
4  vector[N] diam1;
5  vector[N] diam2;
6  vector[N] canopy_height;
7}
8transformed data {
9  vector[N] log_weight;
10  vector[N] log_canopy_volume;
11  log_weight = log(weight);
12  log_canopy_volume = log(diam1 .* diam2 .* canopy_height);
13}
14parameters {
15  vector[2] beta;
16  real<lower=0> sigma;
17}
18model {
19  log_weight ~ normal(beta[1] + beta[2] * log_canopy_volume, sigma);
20}
21
22//@ DATA { N: 46, canopy_height: [46 values], diam1: [46 values], diam2: [46 values], weight: [46 values] }   // values supplied at runtime
23//@ PARAMS ["beta[1]","beta[2]","sigma"]
24//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
25

02answer overlay — reference vs stanrecord(beta[1], beta[2], sigma)

beta[1]

reference stan24 bins · 4.88 … 5.51

beta[2]

reference stan24 bins · 0.53 … 0.90

sigma

reference stan24 bins · 0.33 … 0.65

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0030 ≤ tol 0.0123 · floors 0.0037/0.0050

★ feedback on this problem

posteriordb-mesquite / mesquite

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], sigma) stan pass 7.6474

00 statement source: posteriordb/mesquite-mesquite

given

For each of N mesquite trees the data provide measurements of the tree's weight and six predictor variables: two diameter measurements (diam1 and diam2), canopy height, total height, wood density, and a group indicator. The regression has seven coefficients: an intercept and six slopes (one for each predictor). The intercept has a flat improper uniform prior over the real line. Each of the six predictor slopes has a flat improper uniform prior over the real line. The error standard deviation sigma, constrained to be positive, has a flat improper uniform prior over the positive reals.

model

Each tree's weight is Normal-distributed with a mean equal to the intercept plus the sum of six terms, each being a slope coefficient times the corresponding predictor (diam1, diam2, canopy_height, total_height, density, and group), and a common standard deviation sigma across all observations.

query

The marginal posterior distribution of each of the eight parameters: the intercept (reported as beta[1]), the six predictor slopes (reported as beta[2] through beta[7] for diam1, diam2, canopy_height, total_height, density, and group respectively), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization7.647

stan

1data {
2  int<lower=0> N;
3  vector[N] weight;
4  vector[N] diam1;
5  vector[N] diam2;
6  vector[N] canopy_height;
7  vector[N] total_height;
8  vector[N] density;
9  vector[N] group;
10}
11parameters {
12  vector[7] beta;
13  real<lower=0> sigma;
14}
15model {
16  weight ~ normal(beta[1] + beta[2] * diam1 + beta[3] * diam2
17                  + beta[4] * canopy_height + beta[5] * total_height
18                  + beta[6] * density + beta[7] * group, sigma);
19}
20
21//@ DATA { N: 46, canopy_height: [46 values], density: [46 values], diam1: [46 values], diam2: [46 values], group: [46 values], total_height: [46 values], weight: [46 values] }   // values supplied at runtime
22//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","sigma"]
23//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
24

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], sigma)

beta[1]

reference stan24 bins · -1278 … -191

beta[2]

reference stan24 bins · -213 … 643

beta[3]

reference stan24 bins · -22.3 … 828

beta[4]

reference stan24 bins · -383 … 1024

beta[5]

reference stan24 bins · -875 … 455

beta[6]

reference stan24 bins · -10.8 … 233

beta[7]

reference stan24 bins · -814 … -93.6

sigma

reference stan24 bins · 208 … 441

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=7.6474 ≤ tol 29.6690 · floors 9.2119/14.1346

★ feedback on this problem

posteriordb-nes1972 / nes

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma) stan pass 0.0204

00 statement source: posteriordb/nes1972-nes

given

For N = 1330 respondents, the data provide party identification score (partyid7), measured on a 7-point ordinal scale; self-reported ideology rating (real_ideo); race coding (race_adj); education level (educ1); gender (gender); income level (income); and age group (age_discrete, coded 1 = under 30, 2 = 30 to 44, 3 = 45 to 64, 4 = 65 and over). The model constructs three dummy variables for age groups: age30_44 (equals 1 if age_discrete = 2, else 0), age45_64 (equals 1 if age_discrete = 3, else 0), and age65up (equals 1 if age_discrete = 4, else 0), with the under-30 group as the reference category. The regression has nine coefficients (intercept and eight slopes), each with a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each respondent's party identification score is normally distributed with a mean equal to the intercept plus slopes for ideology, race, the three age-group dummy variables, education, gender, and income, each multiplied by the corresponding predictor value. The standard deviation of this normal distribution is sigma, common across all respondents.

query

The marginal posterior distributions of the nine parameters: the intercept (reported as beta[1]), the ideology slope (beta[2]), the race slope (beta[3]), the slope for age 30-44 (beta[4]), the slope for age 45-64 (beta[5]), the slope for age 65 and over (beta[6]), the education slope (beta[7]), the gender slope (beta[8]), the income slope (beta[9]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.020

stan

1data {
2  int<lower=0> N;
3  vector[N] partyid7;
4  vector[N] real_ideo;
5  vector[N] race_adj;
6  vector[N] educ1;
7  vector[N] gender;
8  vector[N] income;
9  array[N] int age_discrete;
10}
11transformed data {
12  vector[N] age30_44; // age as factor
13  vector[N] age45_64;
14  vector[N] age65up;
15  
16  for (n in 1 : N) {
17    age30_44[n] = age_discrete[n] == 2;
18    age45_64[n] = age_discrete[n] == 3;
19    age65up[n] = age_discrete[n] == 4;
20  }
21}
22parameters {
23  vector[9] beta;
24  real<lower=0> sigma;
25}
26model {
27  // vectorization
28  partyid7 ~ normal(beta[1] + beta[2] * real_ideo + beta[3] * race_adj
29                    + beta[4] * age30_44 + beta[5] * age45_64
30                    + beta[6] * age65up + beta[7] * educ1 + beta[8] * gender
31                    + beta[9] * income, sigma);
32}
33
34//@ DATA { N: 1330, age_discrete: [1330 values], educ1: [1330 values], gender: [1330 values], income: [1330 values], partyid7: [1330 values], race_adj: [1330 values], real_ideo: [1330 values] }   // values supplied at runtime
35//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]","beta[9]","sigma"]
36//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
37

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

beta[1]

reference stan24 bins · 0.58 … 2.79

beta[2]

reference stan24 bins · 0.36 … 0.63

beta[3]

reference stan24 bins · -1.70 … -0.53

beta[4]

reference stan24 bins · -0.59 … 0.23

beta[5]

reference stan24 bins · -0.43 … 0.36

beta[6]

reference stan24 bins · -0.02 … 1.09

beta[7]

reference stan24 bins · 0.11 … 0.50

beta[8]

reference stan24 bins · -0.41 … 0.32

beta[9]

reference stan24 bins · 0.01 … 0.33

sigma

reference stan24 bins · 1.78 … 2.00

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0204 ≤ tol 0.0745 · floors 0.0372/0.0286

★ feedback on this problem

posteriordb-nes1976 / nes

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma) stan pass 0.0131

00 statement source: posteriordb/nes1976-nes

given

For each of N = 1184 respondents, the data provide a party identification measure on a 7-point scale (partyid7), five continuous predictors (ideological self-placement, adjusted race, education level, gender, and income), and a categorical age variable with four levels. The regression has nine coefficients: an intercept and eight slopes (for ideological self-placement, adjusted race, three binary indicators for age categories 30-44, 45-64, and 65+, education level, gender, and income), each with a flat (improper uniform) prior over the reals. The baseline age category (age under 30) is absorbed into the intercept. The error standard deviation sigma, constrained positive, has an improper uniform prior over the positive reals.

model

Each respondent's party identification is Normal-distributed with a mean equal to a linear combination of an intercept, the respondent's ideological self-placement, adjusted race, three binary indicators for age categories (with the baseline category under 30 absorbed into the intercept), education level, gender, and income, multiplied by their respective slope coefficients. The common standard deviation across all respondents is sigma.

query

The marginal posterior distribution of each of the ten parameters: the intercept (reported as beta[1]), the slope on ideological self-placement (reported as beta[2]), the slope on adjusted race (reported as beta[3]), the slope on the 30-44 age indicator (reported as beta[4]), the slope on the 45-64 age indicator (reported as beta[5]), the slope on the 65+ age indicator (reported as beta[6]), the slope on education level (reported as beta[7]), the slope on gender (reported as beta[8]), the slope on income (reported as beta[9]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.013

stan

1data {
2  int<lower=0> N;
3  vector[N] partyid7;
4  vector[N] real_ideo;
5  vector[N] race_adj;
6  vector[N] educ1;
7  vector[N] gender;
8  vector[N] income;
9  array[N] int age_discrete;
10}
11transformed data {
12  vector[N] age30_44; // age as factor
13  vector[N] age45_64;
14  vector[N] age65up;
15  
16  for (n in 1 : N) {
17    age30_44[n] = age_discrete[n] == 2;
18    age45_64[n] = age_discrete[n] == 3;
19    age65up[n] = age_discrete[n] == 4;
20  }
21}
22parameters {
23  vector[9] beta;
24  real<lower=0> sigma;
25}
26model {
27  // vectorization
28  partyid7 ~ normal(beta[1] + beta[2] * real_ideo + beta[3] * race_adj
29                    + beta[4] * age30_44 + beta[5] * age45_64
30                    + beta[6] * age65up + beta[7] * educ1 + beta[8] * gender
31                    + beta[9] * income, sigma);
32}
33
34//@ DATA { N: 1184, age_discrete: [1184 values], educ1: [1184 values], gender: [1184 values], income: [1184 values], partyid7: [1184 values], race_adj: [1184 values], real_ideo: [1184 values] }   // values supplied at runtime
35//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]","beta[9]","sigma"]
36//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
37

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

beta[1]

reference stan24 bins · -0.47 … 2.24

beta[2]

reference stan24 bins · 0.47 … 0.70

beta[3]

reference stan24 bins · -1.74 … -0.57

beta[4]

reference stan24 bins · -0.59 … 0.40

beta[5]

reference stan24 bins · -0.54 … 0.48

beta[6]

reference stan24 bins · -0.10 … 1.01

beta[7]

reference stan24 bins · 0.07 … 0.46

beta[8]

reference stan24 bins · -0.16 … 0.46

beta[9]

reference stan24 bins · -0.01 … 0.34

sigma

reference stan24 bins · 1.68 … 1.92

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0131 ≤ tol 0.0395 · floors 0.0173/0.0194

★ feedback on this problem

posteriordb-nes1980 / nes

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma) stan pass 0.0186

00 statement source: posteriordb/nes1980-nes

given

For each of N respondents, the data provide party identification on a continuous scale (partyid7), respondent ideology (real_ideo), race adjustment (race_adj), education level (educ1), gender, income level, and a discrete age group indicator (age_discrete: 1 for under 30, 2 for ages 30-44, 3 for ages 45-64, 4 for age 65 and over). The model includes nine regression coefficients (intercept and eight slopes), each with a flat or improper uniform prior over the reals. The error standard deviation sigma, constrained positive, has an improper uniform prior over the positive reals.

model

Each respondent's party identification is normally distributed with a mean equal to a linear combination of an intercept, slopes on ideology, race adjustment, three age group indicators (ages 30-44, 45-64, and 65-plus), education, gender, and income, and a common standard deviation sigma across all respondents. The age group indicators are binary variables derived from the discrete age group: one for each age bracket (with the under-30 group as reference), taking value 1 if the respondent falls in that bracket and 0 otherwise.

query

The marginal posterior distributions of each of the ten parameters: the intercept (reported as beta[1]), the slope on ideology (reported as beta[2]), the slope on race adjustment (reported as beta[3]), the slope on the 30-44 age group indicator (reported as beta[4]), the slope on the 45-64 age group indicator (reported as beta[5]), the slope on the 65+ age group indicator (reported as beta[6]), the slope on education (reported as beta[7]), the slope on gender (reported as beta[8]), the slope on income (reported as beta[9]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.019

stan

1data {
2  int<lower=0> N;
3  vector[N] partyid7;
4  vector[N] real_ideo;
5  vector[N] race_adj;
6  vector[N] educ1;
7  vector[N] gender;
8  vector[N] income;
9  array[N] int age_discrete;
10}
11transformed data {
12  vector[N] age30_44; // age as factor
13  vector[N] age45_64;
14  vector[N] age65up;
15  
16  for (n in 1 : N) {
17    age30_44[n] = age_discrete[n] == 2;
18    age45_64[n] = age_discrete[n] == 3;
19    age65up[n] = age_discrete[n] == 4;
20  }
21}
22parameters {
23  vector[9] beta;
24  real<lower=0> sigma;
25}
26model {
27  // vectorization
28  partyid7 ~ normal(beta[1] + beta[2] * real_ideo + beta[3] * race_adj
29                    + beta[4] * age30_44 + beta[5] * age45_64
30                    + beta[6] * age65up + beta[7] * educ1 + beta[8] * gender
31                    + beta[9] * income, sigma);
32}
33
34//@ DATA { N: 701, age_discrete: [701 values], educ1: [701 values], gender: [701 values], income: [701 values], partyid7: [701 values], race_adj: [701 values], real_ideo: [701 values] }   // values supplied at runtime
35//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]","beta[9]","sigma"]
36//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
37

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

beta[1]

reference stan24 bins · -0.25 … 3.48

beta[2]

reference stan24 bins · 0.43 … 0.81

beta[3]

reference stan24 bins · -1.99 … -0.56

beta[4]

reference stan24 bins · -0.69 … 0.45

beta[5]

reference stan24 bins · -0.90 … 0.23

beta[6]

reference stan24 bins · -0.67 … 0.79

beta[7]

reference stan24 bins · -0.18 … 0.34

beta[8]

reference stan24 bins · -0.49 … 0.44

beta[9]

reference stan24 bins · -0.02 … 0.47

sigma

reference stan24 bins · 1.71 … 1.98

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0186 ≤ tol 0.0536 · floors 0.0268/0.0268

★ feedback on this problem

posteriordb-nes1984 / nes

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma) stan pass 0.0221

00 statement source: posteriordb/nes1984-nes

given

For each of N observations, the data provide a response variable (party identification), six continuous and categorical predictors (ideological position, race adjustment, education level, gender, and income), and a discrete age category (with values 1, 2, 3, 4 corresponding to under 30, 30-44, 45-64, and 65+). The regression includes nine coefficients: an intercept, slopes for each of the six base predictors, and slopes for three binary indicators derived from age categories (for ages 30-44, 45-64, and 65+, with ages under 30 as the implicit reference). Each of the nine coefficients has a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained to be positive, has a flat (improper uniform) prior over the positive reals.

model

A 9-predictor linear regression model for party identification. Each observation's party identification value is normally distributed with a mean equal to the intercept plus a weighted sum of six base predictors (ideological position, race adjustment, education level, gender, and income) and three binary age indicators (for 30-44, 45-64, and 65+, with under 30 as the reference). The standard deviation of the normal distribution is sigma, shared across all observations.

query

The marginal posterior distributions of the ten parameters: beta[1] (the intercept), beta[2] (the slope on ideological position), beta[3] (the slope on race adjustment), beta[4] (the slope on the age 30-44 indicator), beta[5] (the slope on the age 45-64 indicator), beta[6] (the slope on the age 65+ indicator), beta[7] (the slope on education level), beta[8] (the slope on gender), beta[9] (the slope on income), and sigma (the error standard deviation).

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.022

stan

1data {
2  int<lower=0> N;
3  vector[N] partyid7;
4  vector[N] real_ideo;
5  vector[N] race_adj;
6  vector[N] educ1;
7  vector[N] gender;
8  vector[N] income;
9  array[N] int age_discrete;
10}
11transformed data {
12  vector[N] age30_44; // age as factor
13  vector[N] age45_64;
14  vector[N] age65up;
15  
16  for (n in 1 : N) {
17    age30_44[n] = age_discrete[n] == 2;
18    age45_64[n] = age_discrete[n] == 3;
19    age65up[n] = age_discrete[n] == 4;
20  }
21}
22parameters {
23  vector[9] beta;
24  real<lower=0> sigma;
25}
26model {
27  // vectorization
28  partyid7 ~ normal(beta[1] + beta[2] * real_ideo + beta[3] * race_adj
29                    + beta[4] * age30_44 + beta[5] * age45_64
30                    + beta[6] * age65up + beta[7] * educ1 + beta[8] * gender
31                    + beta[9] * income, sigma);
32}
33
34//@ DATA { N: 1226, age_discrete: [1226 values], educ1: [1226 values], gender: [1226 values], income: [1226 values], partyid7: [1226 values], race_adj: [1226 values], real_ideo: [1226 values] }   // values supplied at runtime
35//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]","beta[9]","sigma"]
36//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
37

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

beta[1]

reference stan24 bins · 0.97 … 3.35

beta[2]

reference stan24 bins · 0.51 … 0.75

beta[3]

reference stan24 bins · -2.06 … -0.92

beta[4]

reference stan24 bins · -0.70 … 0.19

beta[5]

reference stan24 bins · -1.13 … -0.22

beta[6]

reference stan24 bins · -0.85 … 0.42

beta[7]

reference stan24 bins · -0.17 … 0.25

beta[8]

reference stan24 bins · -0.30 … 0.34

beta[9]

reference stan24 bins · 0.04 … 0.43

sigma

reference stan24 bins · 1.78 … 2.00

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0221 ≤ tol 0.0816 · floors 0.0408/0.0361

★ feedback on this problem

posteriordb-nes1988 / nes

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma) stan pass 0.0156

00 statement source: posteriordb/nes1988-nes

given

For each of N = 1113 individuals, the data provide their party identification on a 7-point scale (partyid7), ideology on a continuous scale (real_ideo), adjusted race category (race_adj), education level (educ1), gender (gender), income level (income), and age in discrete categories (age_discrete). The model has nine regression coefficients (beta[1] through beta[9]), each with a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each individual's party identification is normally distributed with a mean equal to an intercept plus eight slope terms. The slope terms are: the ideology coefficient times that individual's ideology, the race coefficient times that individual's race category, the education coefficient times that individual's education level, the gender coefficient times that individual's gender, the income coefficient times that individual's income, plus three age-related terms constructed from the discrete age variable—binary indicators for whether the individual's age is in the 30-44 range, the 45-64 range, or 65 and over, each multiplied by their respective age coefficients. All individuals share a common standard deviation sigma.

query

The marginal posterior distributions of each of the ten parameters: the intercept (reported as beta[1]), the ideology slope (beta[2]), the race slope (beta[3]), the age 30-44 slope (beta[4]), the age 45-64 slope (beta[5]), the age 65+ slope (beta[6]), the education slope (beta[7]), the gender slope (beta[8]), the income slope (beta[9]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.016

stan

1data {
2  int<lower=0> N;
3  vector[N] partyid7;
4  vector[N] real_ideo;
5  vector[N] race_adj;
6  vector[N] educ1;
7  vector[N] gender;
8  vector[N] income;
9  array[N] int age_discrete;
10}
11transformed data {
12  vector[N] age30_44; // age as factor
13  vector[N] age45_64;
14  vector[N] age65up;
15  
16  for (n in 1 : N) {
17    age30_44[n] = age_discrete[n] == 2;
18    age45_64[n] = age_discrete[n] == 3;
19    age65up[n] = age_discrete[n] == 4;
20  }
21}
22parameters {
23  vector[9] beta;
24  real<lower=0> sigma;
25}
26model {
27  // vectorization
28  partyid7 ~ normal(beta[1] + beta[2] * real_ideo + beta[3] * race_adj
29                    + beta[4] * age30_44 + beta[5] * age45_64
30                    + beta[6] * age65up + beta[7] * educ1 + beta[8] * gender
31                    + beta[9] * income, sigma);
32}
33
34//@ DATA { N: 1113, age_discrete: [1113 values], educ1: [1113 values], gender: [1113 values], income: [1113 values], partyid7: [1113 values], race_adj: [1113 values], real_ideo: [1113 values] }   // values supplied at runtime
35//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]","beta[9]","sigma"]
36//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
37

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

beta[1]

reference stan24 bins · 1.87 … 4.67

beta[2]

reference stan24 bins · 0.49 … 0.73

beta[3]

reference stan24 bins · -2.24 … -1.17

beta[4]

reference stan24 bins · -0.73 … 0.13

beta[5]

reference stan24 bins · -0.90 … -0.01

beta[6]

reference stan24 bins · -1.02 … 0.23

beta[7]

reference stan24 bins · -0.07 … 0.33

beta[8]

reference stan24 bins · -0.42 … 0.24

beta[9]

reference stan24 bins · -0.13 … 0.22

sigma

reference stan24 bins · 1.75 … 2.00

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0156 ≤ tol 0.0584 · floors 0.0292/0.0186

★ feedback on this problem

posteriordb-nes1992 / nes

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma) stan pass 0.0145

00 statement source: posteriordb/nes1992-nes

given

For each of N = 1350 survey respondents, the data provide a party identification score (partyid7), an ideological self-placement score (real_ideo), a race-adjusted value (race_adj), education level (educ1), gender indicator (gender), income level (income), and an age group indicator (age_discrete, an integer in 1, 2, 3, 4 where 1 represents under 30, 2 represents 30-44, 3 represents 45-64, and 4 represents 65+). The regression has nine coefficients: an intercept (beta[1]), a slope for real_ideo (beta[2]), a slope for race_adj (beta[3]), three slopes for age group indicators coded as 30-44 (beta[4]), 45-64 (beta[5]), and 65+ (beta[6]) with under-30 as the reference level, slopes for educ1 (beta[7]), gender (beta[8]), and income (beta[9]). Each regression coefficient has an improper uniform (flat) prior over the reals. The error standard deviation sigma, constrained positive, has an improper uniform prior over the positive reals.

model

Each respondent's party identification score is normally distributed with a mean equal to the intercept plus the sum of products of slopes and predictors (ideological self-placement, race adjustment, age group indicators, education, gender, and income), and a common standard deviation sigma across all respondents.

query

The marginal posterior distributions of each of the ten parameters: the intercept (beta[1]), the coefficient for ideological self-placement (beta[2]), the coefficient for race adjustment (beta[3]), the coefficients for the three age group indicators, 30-44 years old (beta[4]), 45-64 years old (beta[5]), and 65 and older (beta[6]), the coefficient for education (beta[7]), the coefficient for gender (beta[8]), the coefficient for income (beta[9]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.014

stan

1data {
2  int<lower=0> N;
3  vector[N] partyid7;
4  vector[N] real_ideo;
5  vector[N] race_adj;
6  vector[N] educ1;
7  vector[N] gender;
8  vector[N] income;
9  array[N] int age_discrete;
10}
11transformed data {
12  vector[N] age30_44; // age as factor
13  vector[N] age45_64;
14  vector[N] age65up;
15  
16  for (n in 1 : N) {
17    age30_44[n] = age_discrete[n] == 2;
18    age45_64[n] = age_discrete[n] == 3;
19    age65up[n] = age_discrete[n] == 4;
20  }
21}
22parameters {
23  vector[9] beta;
24  real<lower=0> sigma;
25}
26model {
27  // vectorization
28  partyid7 ~ normal(beta[1] + beta[2] * real_ideo + beta[3] * race_adj
29                    + beta[4] * age30_44 + beta[5] * age45_64
30                    + beta[6] * age65up + beta[7] * educ1 + beta[8] * gender
31                    + beta[9] * income, sigma);
32}
33
34//@ DATA { N: 1350, age_discrete: [1350 values], educ1: [1350 values], gender: [1350 values], income: [1350 values], partyid7: [1350 values], race_adj: [1350 values], real_ideo: [1350 values] }   // values supplied at runtime
35//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]","beta[9]","sigma"]
36//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
37

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

beta[1]

reference stan24 bins · 0.31 … 2.71

beta[2]

reference stan24 bins · 0.61 … 0.80

beta[3]

reference stan24 bins · -1.84 … -0.81

beta[4]

reference stan24 bins · -0.71 … 0.18

beta[5]

reference stan24 bins · -0.99 … -0.09

beta[6]

reference stan24 bins · -0.92 … 0.04

beta[7]

reference stan24 bins · 0.12 … 0.44

beta[8]

reference stan24 bins · -0.34 … 0.21

beta[9]

reference stan24 bins · -0.02 … 0.28

sigma

reference stan24 bins · 1.69 … 1.92

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0145 ≤ tol 0.0585 · floors 0.0293/0.0170

★ feedback on this problem

posteriordb-nes1996 / nes

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma) stan pass 0.0138

00 statement source: posteriordb/nes1996-nes

given

For each of N respondents, the data provide seven measurements: ideology on the liberal-conservative spectrum (real_ideo), race (race_adj), education level (educ1), gender (binary), income level (income), and age group as an integer in {1, 2, 3, 4}. The regression has nine coefficients: an intercept and eight slopes, corresponding to ideology, race, three age-group indicators (age 30-44, age 45-64, age 65-plus, with age group 1 as the reference category), education, gender, and income. Each coefficient, including the intercept, has a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

For each respondent, the response partyid7 is modeled as normally distributed with a linear predictor and a common standard deviation sigma. The linear predictor is the intercept plus the sum of the eight slope coefficients each multiplied by the corresponding predictor: ideology, race, three binary age-group indicators (which are deterministically constructed from the discrete age variable, with age group 1 as the reference), education, gender, and income. The three age indicators are one if the respondent belongs to that age group and zero otherwise.

query

The marginal posterior distributions of the nine regression coefficients (the intercept reported as beta[1], and the eight slopes reported as beta[2] through beta[9] for ideology, race, age 30-44, age 45-64, age 65-plus, education, gender, and income respectively) and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.014

stan

1data {
2  int<lower=0> N;
3  vector[N] partyid7;
4  vector[N] real_ideo;
5  vector[N] race_adj;
6  vector[N] educ1;
7  vector[N] gender;
8  vector[N] income;
9  array[N] int age_discrete;
10}
11transformed data {
12  vector[N] age30_44; // age as factor
13  vector[N] age45_64;
14  vector[N] age65up;
15  
16  for (n in 1 : N) {
17    age30_44[n] = age_discrete[n] == 2;
18    age45_64[n] = age_discrete[n] == 3;
19    age65up[n] = age_discrete[n] == 4;
20  }
21}
22parameters {
23  vector[9] beta;
24  real<lower=0> sigma;
25}
26model {
27  // vectorization
28  partyid7 ~ normal(beta[1] + beta[2] * real_ideo + beta[3] * race_adj
29                    + beta[4] * age30_44 + beta[5] * age45_64
30                    + beta[6] * age65up + beta[7] * educ1 + beta[8] * gender
31                    + beta[9] * income, sigma);
32}
33
34//@ DATA { N: 1043, age_discrete: [1043 values], educ1: [1043 values], gender: [1043 values], income: [1043 values], partyid7: [1043 values], race_adj: [1043 values], real_ideo: [1043 values] }   // values supplied at runtime
35//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]","beta[9]","sigma"]
36//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
37

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

beta[1]

reference stan24 bins · -1.34 … 1.33

beta[2]

reference stan24 bins · 0.80 … 1.05

beta[3]

reference stan24 bins · -1.67 … -0.71

beta[4]

reference stan24 bins · -0.62 … 0.46

beta[5]

reference stan24 bins · -0.78 … 0.27

beta[6]

reference stan24 bins · -0.77 … 0.42

beta[7]

reference stan24 bins · 0.05 … 0.55

beta[8]

reference stan24 bins · -0.39 … 0.25

beta[9]

reference stan24 bins · 0.05 … 0.36

sigma

reference stan24 bins · 1.58 … 1.79

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0138 ≤ tol 0.0464 · floors 0.0162/0.0195

★ feedback on this problem

posteriordb-nes2000 / nes

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma) stan pass 0.0223

00 statement source: posteriordb/nes2000-nes

given

For each of N = 476 respondents, the data provide the respondent's party identification (on a 7-point scale), ideology score, race adjustment value, education code, gender, income code, and a discrete age group (1, 2, 3, or 4 representing age brackets including one reference age group and three comparison groups). The regression has nine coefficients: an intercept and eight slopes for the predictors (ideology, race adjustment, age groups 2/3/4, education, gender, and income). All nine coefficients have a flat (improper uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a flat (improper uniform) prior over the positive reals.

model

Each respondent's party identification is normally distributed with a mean equal to the intercept plus the sum of eight slope terms: the ideology coefficient times ideology, the race adjustment coefficient times race adjustment, three age group coefficients for age groups 2, 3, and 4 (with age group 1 as reference), an education coefficient times education, a gender coefficient times gender, and an income coefficient times income. The standard deviation of the response is sigma, common across all respondents.

query

The marginal posterior distribution of each of the ten parameters: the intercept (reported as beta[1]), the ideology coefficient (reported as beta[2]), the race adjustment coefficient (reported as beta[3]), the three age group coefficients (reported as beta[4], beta[5], beta[6] for age groups 2, 3, and 4 respectively), the education coefficient (reported as beta[7]), the gender coefficient (reported as beta[8]), the income coefficient (reported as beta[9]), and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[6]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[7]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[8]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[9]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.022

stan

1data {
2  int<lower=0> N;
3  vector[N] partyid7;
4  vector[N] real_ideo;
5  vector[N] race_adj;
6  vector[N] educ1;
7  vector[N] gender;
8  vector[N] income;
9  array[N] int age_discrete;
10}
11transformed data {
12  vector[N] age30_44; // age as factor
13  vector[N] age45_64;
14  vector[N] age65up;
15  
16  for (n in 1 : N) {
17    age30_44[n] = age_discrete[n] == 2;
18    age45_64[n] = age_discrete[n] == 3;
19    age65up[n] = age_discrete[n] == 4;
20  }
21}
22parameters {
23  vector[9] beta;
24  real<lower=0> sigma;
25}
26model {
27  // vectorization
28  partyid7 ~ normal(beta[1] + beta[2] * real_ideo + beta[3] * race_adj
29                    + beta[4] * age30_44 + beta[5] * age45_64
30                    + beta[6] * age65up + beta[7] * educ1 + beta[8] * gender
31                    + beta[9] * income, sigma);
32}
33
34//@ DATA { N: 476, age_discrete: [476 values], educ1: [476 values], gender: [476 values], income: [476 values], partyid7: [476 values], race_adj: [476 values], real_ideo: [476 values] }   // values supplied at runtime
35//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]","beta[9]","sigma"]
36//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
37

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], beta[6], beta[7], beta[8], beta[9], sigma)

beta[1]

reference stan24 bins · -1.71 … 3.16

beta[2]

reference stan24 bins · 0.62 … 0.93

beta[3]

reference stan24 bins · -2.08 … -0.17

beta[4]

reference stan24 bins · -1.29 … 0.46

beta[5]

reference stan24 bins · -1.57 … 0.18

beta[6]

reference stan24 bins · -1.72 … 0.47

beta[7]

reference stan24 bins · -0.07 … 0.59

beta[8]

reference stan24 bins · -0.61 … 0.51

beta[9]

reference stan24 bins · -0.04 … 0.49

sigma

reference stan24 bins · 1.63 … 1.98

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0223 ≤ tol 0.0797 · floors 0.0398/0.0326

★ feedback on this problem

posteriordb-one_comp_mm_elim_abs / one_comp_mm_elim_abs

answer record(k_a, K_m, V_m, sigma) stan pass 0.4413

00 statement source: posteriordb/one_comp_mm_elim_abs-one_comp_mm_elim_abs

given

For each of N_t = 20 measurement times, the data provide the observed concentration C_hat in mg/L at that time in days. The pharmacokinetic model is fully specified by four inputs: an initial time t0 (fixed at 0 days), a single administered dose D in mg, and a compartment volume V in liters. The four model parameters are the absorption rate constant k_a (units: 1/day), the Michaelis-Menten half-saturation constant K_m (units: mg/L), the maximum elimination rate V_m (units: 1/day), and the measurement error standard deviation sigma (units: mg/L, all strictly positive). Each of k_a, K_m, V_m, and sigma has a half-Cauchy prior with location 0 and scale 1.

model

A one-compartment pharmacokinetic model with first-order absorption and saturable Michaelis-Menten elimination. The concentration in the compartment evolves according to an ordinary differential equation. At each time point, the rate of change of concentration depends on two processes: absorption, a first-order term with rate constant k_a that decays exponentially with the administered dose D and compartment volume V; and elimination, a saturable Michaelis-Menten process where the elimination rate equals (V_m / V) times the concentration divided by (K_m plus the concentration), so the rate increases with concentration but approaches the maximum V_m/V asymptotically. The initial concentration at time t0 is zero. For each measurement time, the observed concentration is generated from a lognormal distribution with log-mean equal to the natural logarithm of the ODE-predicted concentration and standard deviation sigma.

query

The marginal posterior distributions of the four parameters: k_a (the absorption rate constant), K_m (the Michaelis-Menten half-saturation constant), V_m (the maximum elimination rate), and sigma (the measurement error standard deviation).

answer spec record(k_a, K_m, V_m, sigma)

{
  "kind": "record",
  "fields": {
    "k_a": {
      "kind": "dist",
      "domain": "real"
    },
    "K_m": {
      "kind": "dist",
      "domain": "real"
    },
    "V_m": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.441

stan

1functions {
2  array[] real one_comp_mm_elim_abs(real t, array[] real y,
3                                    array[] real theta, array[] real x_r,
4                                    array[] int x_i) {
5    array[1] real dydt;
6    real k_a = theta[1]; // Dosing rate in 1/day
7    real K_m = theta[2]; // Michaelis-Menten constant in mg/L
8    real V_m = theta[3]; // Maximum elimination rate in 1/day
9    real D = x_r[1];
10    real V = x_r[2];
11    real dose = 0;
12    real elim = (V_m / V) * y[1] / (K_m + y[1]);
13    
14    if (t > 0) {
15      dose = exp(-k_a * t) * D * k_a / V;
16    }
17    
18    dydt[1] = dose - elim;
19    
20    return dydt;
21  }
22}
23data {
24  real t0; // Initial time in days;
25  // This is currently hardcoded in data transformations
26  // Uncomment this line to get original model
27  // real C0[1]; // Initial concentration at t0 in mg/L
28  
29  real D; // Total dosage in mg
30  real V; // Compartment volume in L
31  
32  int<lower=1> N_t;
33  array[N_t] real times; // Measurement times in days
34  
35  // Measured concentrations in effect compartment in mg/L
36  array[N_t] real C_hat;
37}
38transformed data {
39  // Comment out the next line to get the original model
40  array[1] real C0 = {0.0};
41  array[2] real x_r = {D, V};
42  array[0] int x_i;
43}
44parameters {
45  real<lower=0> k_a; // Dosing rate in 1/day
46  real<lower=0> K_m; // Michaelis-Menten constant in mg/L
47  real<lower=0> V_m; // Maximum elimination rate in 1/day
48  real<lower=0> sigma;
49}
50transformed parameters {
51  array[N_t, 1] real C;
52  {
53    array[3] real theta = {k_a, K_m, V_m};
54    C = integrate_ode_bdf(one_comp_mm_elim_abs, C0, t0, times, theta, x_r,
55                          x_i);
56  }
57}
58model {
59  // Priors
60  k_a ~ cauchy(0, 1);
61  K_m ~ cauchy(0, 1);
62  V_m ~ cauchy(0, 1);
63  sigma ~ cauchy(0, 1);
64  
65  // Likelihood
66  for (n in 1 : N_t) {
67    C_hat[n] ~ lognormal(log(C[n, 1]), sigma);
68  }
69}
70generated quantities {
71  array[N_t] real C_ppc;
72  for (n in 1 : N_t) {
73    C_ppc[n] = lognormal_rng(log(C[n, 1]), sigma);
74  }
75}
76
77//@ DATA { t0: 0, D: 30, V: 2, times: [20 values], N_t: 20, C_hat: [20 values] }   // values supplied at runtime
78//@ PARAMS ["k_a","K_m","V_m","sigma"]
79//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
80

02answer overlay — reference vs stanrecord(k_a, K_m, V_m, sigma)

k_a

reference stan24 bins · 0.53 … 1.05

K_m

reference stan24 bins · 0.90 … 42.1

V_m

reference stan24 bins · 0.38 … 3.74

sigma

reference stan24 bins · 0.08 … 0.24

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.4413 ≤ tol 1.8806 · floors 0.9403/0.5151

★ feedback on this problem

posteriordb-sblrc / blr

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], sigma) stan pass 0.0035

00 statement source: posteriordb/sblrc-blr

given

For each of N observations, the data provide D predictors (real-valued) and a response variable (real-valued). The regression has D coefficients, one for each predictor, each with a Normal(0, 10) prior with mean 0 and standard deviation 10. The error standard deviation sigma, constrained positive, has a flat improper uniform prior over the positive reals (equivalent to a normal(0, 10) prior applied to the positive reals).

model

Each observation's response is Normal-distributed with a mean equal to the linear combination of the D predictors and their coefficients, and a common standard deviation sigma across all observations.

query

The marginal posterior distributions of each of the D+1 parameters: the D regression coefficients beta[1], beta[2], beta[3], beta[4], beta[5], and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.004

stan

1data {
2  int<lower=0> N;
3  int<lower=0> D;
4  matrix[N, D] X;
5  vector[N] y;
6}
7parameters {
8  vector[D] beta;
9  real<lower=0> sigma;
10}
11model {
12  // prior
13  target += normal_lpdf(beta | 0, 10);
14  target += normal_lpdf(sigma | 0, 10);
15  // likelihood
16  target += normal_lpdf(y | X * beta, sigma);
17}
18
19//@ DATA { y: [100 values], X: [100×5 matrix], D: 5, N: 100 }   // values supplied at runtime
20//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","sigma"]
21//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
22

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], sigma)

beta[1]

reference stan8 bins · 1.00 … 1.00

beta[2]

reference stan8 bins · 0.99 … 1.00

beta[3]

reference stan9 bins · 0.99 … 1.00

beta[4]

reference stan8 bins · 0.99 … 1.00

beta[5]

reference stan8 bins · 0.99 … 1.00

sigma

reference stan24 bins · 0.86 … 1.30

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0035 ≤ tol 0.0186 · floors 0.0093/0.0041

★ feedback on this problem

posteriordb-sblri / blr

answer record(beta[1], beta[2], beta[3], beta[4], beta[5], sigma) stan pass 0.0036

00 statement source: posteriordb/sblri-blr

given

For each of N = 100 observations, the data provide a response variable y and D = 5 predictor values arranged in a design matrix X (100 rows by 5 columns). The five regression coefficients beta[1], beta[2], beta[3], beta[4], beta[5] each have an improper flat (uniform) prior over the real line. The error standard deviation sigma, constrained positive, has a normal(0, 10) prior.

model

Each observation's response y[i] is normally distributed with a mean equal to the linear combination of the five predictors weighted by their corresponding coefficients (beta[1] through beta[5]), and a common standard deviation sigma across all observations.

query

The marginal posterior distributions of the six parameters: the five regression coefficients (beta[1], beta[2], beta[3], beta[4], beta[5]) and the error standard deviation sigma.

answer spec record(beta[1], beta[2], beta[3], beta[4], beta[5], sigma)

{
  "kind": "record",
  "fields": {
    "beta[1]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[2]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[3]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[4]": {
      "kind": "dist",
      "domain": "real"
    },
    "beta[5]": {
      "kind": "dist",
      "domain": "real"
    },
    "sigma": {
      "kind": "dist",
      "domain": "real"
    }
  }
}

system prompt constant across problems

(system prompt loads here)

01 realizations comparing reference vs stan

◆ground truth

stored ground truth

Gold reference posterior draws from posteriordb (10 NUTS chains, R-hat ≈ 1). Not program code — the realized marginals are the answer overlay below.

◆realization0.004

stan

1data {
2  int<lower=0> N;
3  int<lower=0> D;
4  matrix[N, D] X;
5  vector[N] y;
6}
7parameters {
8  vector[D] beta;
9  real<lower=0> sigma;
10}
11model {
12  // prior
13  target += normal_lpdf(beta | 0, 10);
14  target += normal_lpdf(sigma | 0, 10);
15  // likelihood
16  target += normal_lpdf(y | X * beta, sigma);
17}
18
19//@ DATA { y: [100 values], X: [100×5 matrix], D: 5, N: 100 }   // values supplied at runtime
20//@ PARAMS ["beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","sigma"]
21//@ SAMPLING {"chains":4,"iter_warmup":1000,"iter_sampling":1000}
22

02answer overlay — reference vs stanrecord(beta[1], beta[2], beta[3], beta[4], beta[5], sigma)

beta[1]

reference stan7 bins · 1.00 … 1.00

beta[2]

reference stan9 bins · 1.00 … 1.00

beta[3]

reference stan6 bins · 1.00 … 1.00

beta[4]

reference stan7 bins · 1.00 … 1.00

beta[5]

reference stan9 bins · 1.00 … 1.01

sigma

reference stan24 bins · 0.79 … 1.28

03 verification

check	status	evidence
cross-language (stan vs reference)	pass	d=0.0036 ≤ tol 0.0165 · floors 0.0082/0.0034

★ feedback on this problem