proof of concept for posterior_pit#1857
proof of concept for posterior_pit#1857avehtari wants to merge 40 commits intopaul-buerkner:masterfrom
Conversation
paul-buerkner
left a comment
There was a problem hiding this comment.
Thanks! I like the idea. Please find some comments to code structure and naming in my review.
| object, newdata = NULL, re_formula = NULL, re.form = NULL, | ||
| transform = NULL, resp = NULL, negative_rt = FALSE, | ||
| ndraws = NULL, draw_ids = NULL, sort = FALSE, ntrys = 5, | ||
| ndraws = NULL, draw_ids = NULL, sort = FALSE, ntrys = 5, type = "r", |
There was a problem hiding this comment.
Can we think of a more informative name than "type" and more informative options that just "r" and "p"?
There was a problem hiding this comment.
I was thinking that "r", "p", "q", and "d" would naturally refer to how the distribution functions are named in R. Looking at R documentation these could be also named "random", "probability", "quantile", and "density".
I've not been able to come up with better name than "type"
There was a problem hiding this comment.
similarily unspecific. Let' s go with "type" for now until we have a better idea.
There was a problem hiding this comment.
Alternative argument names could be output, return_type, or mode.
| prep <- prepare_predictions( | ||
| object, newdata = newdata, re_formula = re_formula, resp = resp, | ||
| ndraws = ndraws, draw_ids = draw_ids, check_response = FALSE, ... | ||
| ndraws = ndraws, draw_ids = draw_ids, check_response = FALSE, type = type, ... |
There was a problem hiding this comment.
why does prepare predictions need to know the type?
There was a problem hiding this comment.
It's likely that it doesn't, but I was struggling to see the structure of the code, and possibly put it in more places than needed
| mean = mu, sd = sigma, | ||
| lb = prep$data$lb[i], ub = prep$data$ub[i], | ||
| ntrys = ntrys | ||
| switch(type, |
There was a problem hiding this comment.
whe should solve this with a better abstraction. Changing the code of each of the families using switch is unnecessary. Add another layer of a function that handles the two types once and can be called (instead of just rcontinous) in the family-specific functions
There was a problem hiding this comment.
I don't understand. Can explain more or modify this PR to show it with an example?
There was a problem hiding this comment.
Should we have a short zoom call about this?
There was a problem hiding this comment.
yes. will write you on slack
There was a problem hiding this comment.
How about something like the following? We have a helper function that prepares the switching and can be called within the corresponding distributions. In the following a (non-polished) example for a helper function and its function call in the two example distributions (gaussian and student). (Note I changed the argument type to output in the following code... just to get an idea how it would look like)
.predict_continuous_helper <- function(output, prep, i, dist, ntrys, ...) {
lb <- prep$data$lb[i]
ub <- prep$data$ub[i]
if (output == "probability") {
q <- prep$data$Y[i]
return(pcontinuous(
q = q, dist = dist, lb = lb, ub = ub, ntrys = ntrys,
ndraws = prep$ndraws, ...
))
} else if (output == "random") {
return(rcontinuous(
n = prep$ndraws, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ...
))
}
}
posterior_predict_gaussian <- function(i, prep, ntrys = 5, output = "random", ...) {
mu <- get_dpar(prep, "mu", i = i)
sigma <- get_dpar(prep, "sigma", i = i)
sigma <- add_sigma_se(sigma, prep, i = i)
.predict_continuous_helper(
output = output, prep = prep, i = i, ntrys = ntrys,
dist = "norm", mean = mu, sd = sigma
)
}
posterior_predict_student <- function(i, prep, ntrys = 5, output = "random", ...) {
nu <- get_dpar(prep, "nu", i = i)
mu <- get_dpar(prep, "mu", i = i)
sigma <- get_dpar(prep, "sigma", i = i)
sigma <- add_sigma_se(sigma, prep, i = i)
.predict_continuous_helper(
output = output, prep = prep, i = i, ntrys = ntrys,
dist = "student_t", df = nu, mu = mu, sigma = sigma
)
}
| pdist <- paste0("p", dist) | ||
| out <- do_call(pdist, c(list(n), args)) | ||
| } else { | ||
| error("not implemented yet") |
There was a problem hiding this comment.
are you planning to implement this part too as part of this PR?
There was a problem hiding this comment.
I can, if we go forward with the plan
There was a problem hiding this comment.
Okay. Let's discuss the plan first and then move this one forward too
|
Yes this is exactly what I meant!
Florence Bockting ***@***.***> schrieb am Di. 24. Feb. 2026
um 20:06:
… ***@***.**** commented on this pull request.
------------------------------
In R/posterior_predict.R
<#1857 (comment)>:
> mu <- get_dpar(prep, "mu", i = i)
sigma <- get_dpar(prep, "sigma", i = i)
sigma <- add_sigma_se(sigma, prep, i = i)
- rcontinuous(
- n = prep$ndraws, dist = "norm",
- mean = mu, sd = sigma,
- lb = prep$data$lb[i], ub = prep$data$ub[i],
- ntrys = ntrys
+ switch(type,
How about something like the following? We have a helper function that
prepares the switching and can be called within the corresponding
distributions. In the following a (non-polished) example for a helper
function and its function call in the two example distributions (gaussian
and student). (Note I changed the argument type to output in the
following code... just to get an idea how it would look like)
.predict_continuous_helper <- function(output, prep, i, dist, ntrys, ...) {
lb <- prep$data$lb[i]
ub <- prep$data$ub[i]
if (output == "probability") {
q <- prep$data$Y[i]
return(pcontinuous(
q = q, dist = dist, lb = lb, ub = ub, ntrys = ntrys,
ndraws = prep$ndraws, ...
))
} else if (output == "random") {
return(rcontinuous(
n = prep$ndraws, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ...
))
}
}
posterior_predict_gaussian <- function(i, prep, ntrys = 5, output = "random", ...) {
mu <- get_dpar(prep, "mu", i = i)
sigma <- get_dpar(prep, "sigma", i = i)
sigma <- add_sigma_se(sigma, prep, i = i)
.predict_continuous_helper(
output = output, prep = prep, i = i, ntrys = ntrys,
dist = "norm", mean = mu, sd = sigma
)
}
posterior_predict_student <- function(i, prep, ntrys = 5, output = "random", ...) {
nu <- get_dpar(prep, "nu", i = i)
mu <- get_dpar(prep, "mu", i = i)
sigma <- get_dpar(prep, "sigma", i = i)
sigma <- add_sigma_se(sigma, prep, i = i)
.predict_continuous_helper(
output = output, prep = prep, i = i, ntrys = ntrys,
dist = "student_t", df = nu, mu = mu, sigma = sigma
)
}
—
Reply to this email directly, view it on GitHub
<#1857 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2AAA5QELHJGMCXTWVBL4NSOLVAVCNFSM6AAAAACV2RC4P6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTQNBZHE4TMMJZGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
| F_q <- do_call(pdist, c(list(q), args)) | ||
| # include (lb - 1) to treat lb as inclusive | ||
| # this is only relevant for discrete distributions | ||
| F_lb <- do_call(pdist, c(list(lb - 1), args)) |
There was a problem hiding this comment.
Here I am a bit unsure how to treat this correctly. I would have treated the lower bound as exclusive but this would have been inconsistent with the rdiscrete function.
Thus, I set lb - 1 in order to ensure that the lower bound is inclusive but I was uncertain here.
There was a problem hiding this comment.
It should be consistent with how Stan handles it. I am not 100% anymore how it does. Can you double check to match the behavior?
|
I have now updated the proof of concept for 4 functions:
They include two additional new arguments:
The current implementation follows the idea that we can call:
|
I think these should be just F(x) to follow what ppois etc. do. For the same reason, I think that it might be more logical to have non-randomized as the default |
|
Or we could have separately output options |
Okay, one option could be to have for An alternative is to outsource the |
|
I would argue putting everything in one function. The code overhead is
otherwise way too high. I like the probability + randomized argument
approach. The only question is what the default for randomized is? Aki can
you elaborate which use cases you had for much choice of randomized?
Florence Bockting ***@***.***> schrieb am Mo. 16. März 2026
um 07:50:
… *florence-bockting* left a comment (paul-buerkner/brms#1857)
<#1857 (comment)>
Or we could have separately output options probability (not randomized
F(x)) and pit randomized for discrete
Okay, one option could be to have for outcome the values random,
probability, density, quantile, and pit. Whereby pit is only a valid
option for discrete cases. Cons are that there is a conceptual overlap
between pit and probability and pit is only a valid choice for a subclass
of inputs. Pro is that we have all alternatives in one function and no
additional argument (like randomized).
An alternative is to outsource the pit case entirely and to have an
additional function such as posterior_pit(fit_pois) that applies only to
discrete cases and additionally, we have posterior_predict(fit_pois,
outcome = "probability") supporting the default implementations like
ppois(). In my opinion, this option is from a design point relatively
clean, as we don't have the problem that there are "cases" which work/don't
work. However, we would have to introduce an additional function.
—
Reply to this email directly, view it on GitHub
<#1857 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2ACTEEEYGBYRD56TRK34Q6P4VAVCNFSM6AAAAACV2RC4P6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DANRVGQ3DMNZVHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
In this case
|
|
I am fine with option random, probability, density, quantile, and pit (or whatever subset we go with first), as long all all is realized in the same function (posterior_predict). Otherwise code overhead is too high. If we want, we can still have lightweight wrappers around posterior_predict with output argument fixed. |
Alright. We definitely stick with one function - the idea of an extra function is out. |
Alright, go ahead @paul-buerkner |
Just a quick check-in and reminder @paul-buerkner |
paul-buerkner
left a comment
There was a problem hiding this comment.
Getting there, thank you! Here are my comments and change requests
|
Just leave it as is I meant.
Florence Bockting ***@***.***> schrieb am Mo. 13. Apr. 2026
um 08:42:
… ***@***.**** commented on this pull request.
------------------------------
In R/posterior_predict.R
<#1857 (comment)>:
> @@ -92,12 +97,15 @@ posterior_predict.brmsfit <- function(
contains_draws(object)
object <- restructure(object)
prep <- prepare_predictions(
- object, newdata = newdata, re_formula = re_formula, resp = resp,
+ object,
Do you want every argument on a new line or do you make a differentiation
between required and optional arguments, thus:
prepare_predictions(
object,
newdata = newdata, re_formula = re_formula, resp = resp,
ndraws = ndraws, draw_ids = draw_ids, check_response = FALSE, ...
)
or
prepare_predictions(
object,
newdata = newdata,
re_formula = re_formula,
resp = resp,
ndraws = ndraws,
draw_ids = draw_ids,
check_response = FALSE,
...
)
—
Reply to this email directly, view it on GitHub
<#1857 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2AHMN2BOUSYKZPT2ZE34VSD63AVCNFSM6AAAAACV2RC4P6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DAOJWHAYTANJXGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
…n posterior_predict
|
Would it be too much trouble to allow argument |
|
@paul-buerkner I added now for the families (that we used as current test cases) support for "quantile" and "density" Thereby I realised that there was no implementation for |
|
Just lack of time to implement. If you implement qzero_inflated and
qhurdle_ it should be agnostic to the family this is combined with
Florence Bockting ***@***.***> schrieb am Mo. 27. Apr. 2026
um 09:31:
… *florence-bockting* left a comment (paul-buerkner/brms#1857)
<#1857 (comment)>
@paul-buerkner <https://github.com/paul-buerkner> I added now for the
families (that we used as current test cases) support for "quantile" and
"density"
Thereby I realised that there was no implementation for qbeta_binomial
and qzero_inflated_negbinomial.
For the start, I added an implementation for both families but there might
be a good reason for the fact that there was no implementation (e.g. no
closed form available and only numeric approx. possible). So, I just wanted
to point this out and check with you. I am also fine with removing the
implementation of these two families again and rather provide for these
cases a "Not implemented" error.
—
Reply to this email directly, view it on GitHub
<#1857 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADCW2AA274H7TRO3GXXLTOL4X4EDJAVCNFSM6AAAAACV2RC4P6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DGMRVGA2TCNRQGU>
.
Triage notifications, keep track of coding agent tasks and review pull
requests on the go with GitHub Mobile for iOS
<https://github.com/notifications/mobile/ios/ADCW2ACLX4AJAYVPK6EMR3L4X4EDJA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMZSGUYDKMJWGA22M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJKTGN5XXIZLSL5UW64Y>
and Android
<https://github.com/notifications/mobile/android/ADCW2AC5VEKCGHMHRU2FTBT4X4EDJA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMZSGUYDKMJWGA22M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLTGN5XXIZLSL5QW4ZDSN5UWI>.
Download it today!
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
output support in R/posterior_predict.R (click to expand) |
| method | random | probability* | density* | pit* | quantile* | notes |
|---|---|---|---|---|---|---|
gaussian |
Y | Y | Y | Y | Y | |
student |
Y | Y | Y | Y | Y | |
lognormal |
Y | Y | Y | Y | Y | |
shifted_lognormal |
Y | |||||
skew_normal |
Y | |||||
gaussian_mv |
Y | |||||
student_mv |
Y | |||||
gaussian_time |
Y | |||||
student_time |
Y | |||||
gaussian_lagsar |
Y | |||||
student_lagsar |
Y | |||||
gaussian_errorsar |
Y | |||||
student_errorsar |
Y | |||||
gaussian_fcor |
Y | |||||
student_fcor |
Y | |||||
binomial |
Y | Y | Y | Y | Y | |
beta_binomial |
Y | Y | Y | Y | Y | |
bernoulli |
Y | Y | Y | Y | Y | |
poisson |
Y | Y | Y | Y | Y | |
negbinomial |
Y | Y | Y | Y | Y | |
negbinomial2 |
Y | Y | Y | Y | Y | |
geometric |
Y | |||||
discrete_weibull |
Y | |||||
com_poisson |
Y | |||||
exponential |
Y | |||||
gamma |
Y | Y | Y | Y | Y | |
weibull |
Y | Y | Y | Y | Y | |
frechet |
Y | |||||
gen_extreme_value |
Y | |||||
inverse.gaussian |
Y | |||||
exgaussian |
Y | |||||
wiener |
Y | |||||
beta |
Y | Y | Y | Y | Y | Y |
xbeta |
Y | |||||
von_mises |
Y | |||||
asym_laplace |
Y | |||||
zero_inflated_asym_laplace |
Y | |||||
cox |
explicitly unsupported | |||||
hurdle_poisson |
Y | |||||
hurdle_negbinomial |
Y | |||||
hurdle_gamma |
Y | |||||
hurdle_lognormal |
Y | |||||
hurdle_cumulative |
Y | |||||
zero_inflated_beta |
Y | Y | Y | Y | Y | |
zero_one_inflated_beta |
Y | |||||
zero_inflated_poisson |
Y | Y | Y | Y | Y | |
zero_inflated_negbinomial |
Y | Y | Y | Y | Y | |
zero_inflated_binomial |
Y | Y | Y | Y | Y | |
zero_inflated_beta_binomial |
Y | Y | Y | Y | Y | |
categorical |
Y | |||||
multinomial |
Y | |||||
dirichlet_multinomial |
Y | |||||
dirichlet |
Y | |||||
dirichlet2 |
Y | |||||
logistic_normal |
Y | |||||
cumulative |
Y | |||||
sratio |
Y | |||||
cratio |
Y | |||||
acat |
Y | |||||
ordinal |
Y | |||||
custom |
delegated to custom family method | |||||
mixture |
depends | depends | depends | depends | depends | delegated to component families |
R/distributions.R; click to expand) |
| Family | ddist |
pdist |
qdist |
rdist |
notes |
|---|---|---|---|---|---|
acat |
Y | ||||
asym_laplace |
Y | Y | Y | Y | |
beta_binomial |
Y | Y | Y* | Y | |
categorical |
Y | Y | |||
com_poisson |
Y | Y | Y | ||
cox |
Y | Y | |||
cratio |
Y | ||||
cumulative |
Y | ||||
dirichlet |
Y | Y | |||
dirichletmultinomial |
Y | ||||
discrete_weibull |
Y | Y | Y | Y | |
exgaussian |
Y | Y | Y | ||
frechet |
Y | Y | Y | Y | |
gen_extreme_value |
Y | Y | Y | Y | |
hurdle_gamma |
Y | Y | |||
hurdle_lognormal |
Y | Y | |||
hurdle_negbinomial |
Y | Y | |||
hurdle_poisson |
Y | Y | |||
inv_gaussian |
Y | Y | Y | ||
invgamma |
Y | ||||
logistic_normal |
Y | Y | |||
multi_normal |
Y | Y | |||
multi_student_t |
Y | Y | |||
multinomial |
Y | ||||
ordinal |
Y | ||||
shifted |
Y | Y | Y | Y | |
shifted_lnorm |
Y | Y | Y | Y | |
skew_normal |
Y | Y | Y | Y | |
sratio |
Y | ||||
student_t |
Y | Y | Y | Y | |
von_mises |
Y | Y | Y | ||
wiener |
Y | Y | |||
xbeta |
Y | Y | Y | Y | |
zero_inflated_asym_laplace |
Y | Y | |||
zero_inflated_beta |
Y | Y | Y* | ||
zero_inflated_beta_binomial |
Y | Y | Y* | ||
zero_inflated_binomial |
Y | Y | Y* | ||
zero_inflated_negbinomial |
Y | Y | Y* | ||
zero_inflated_poisson |
Y | Y | Y* |
This is a proof of concept branch and far from ready to be merged, but easier to show a possible way to solve issue #1855
To avoid duplication of code in brms, the idea is to use
posterior_predict(), but add a new argumenttypewith default value"r"referring to random draws. Newtype="p"would instead return CDFs / PITs.These PIT values are more useful than the ones based on ranks.
The proof of concept works with families "gaussian" and "student_t" without truncation. If the general idea is acceptable then support for the rest of the families and truncation can be added.
Adding here that natural other types are "d" for posterior predictive densities
and
"q" for quantiles that are more accurate than current rank based quantiles (used e.g. by predictive_interval())