-
-
Notifications
You must be signed in to change notification settings - Fork 220
proof of concept for posterior_pit #1857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
avehtari
wants to merge
40
commits into
paul-buerkner:master
Choose a base branch
from
avehtari:posterior_pit
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 15 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
13427fe
proof of concept for posterior_pit
avehtari 1b046eb
switch 'type' to 'output'; add wrapper function
florence-bockting 5102613
add test for posterior_predict with output arg
florence-bockting bbfd28c
fix 'probability' method for posterior_predict
florence-bockting 780ad34
add test for posterior_predict_gaussian
florence-bockting 3106b93
fix setting of q
florence-bockting 232cf1d
add failing test if q is assigned wrongly
florence-bockting a898b24
add cdf for truncated cont. distr
florence-bockting 94cb02a
add test for truncated posterior_predict_gaussian
florence-bockting 18667c1
add test for posterior_predict_student
2849ae3
remove dot from predict_continuous_helper for consistency
f5012d8
support of posterior_predict for discrete distributions
456d4af
add test for posterior_predict_binomial
a6f2beb
add posterior_predict_poisson with support of diff. output values
41f5de0
add test for posterior_predict_poisson
070016a
ignore agent skills
ee40ef6
update posterior_predict() with output argument
6a0a2ce
update posterior_predict with outcome values probability, random, pit
5a03d33
simplify switch case
7220a6d
adjust code style
cad4b8e
refactor: remove unnessary wrapper
462cc30
chore: update .gitignore to include skills
6cdf33b
build(deps): remove truncnorm and dplyr from Suggests
ef2520f
docs: update vignette for posterior_predict to use data.frame instead…
ed6df11
tests: remove truncnorm dependency and explicit naming of default values
45b6600
style,docs: undo style changes, adjust argument checking and naming i…
2b364ac
style: undo change in indentation style in docs example
db26e89
feature: update beta-binomial with new posterior_predict functionality
2fbcea7
docs: add beta-binomial example to posterior_predict vignette
49f4d97
feature: update negbinomial with new posterior_predict functionality
12faea1
docs: add negbinomial example to posterior_predict vignette
c55c538
chore: add packages from Suggests to dependency install GitHub Action
5011361
feature: update zero-inflated negbinomial with new posterior_predict …
ccc9cfe
fix: pass q (quantile) as argument in posterior_predict
81c4e22
feature: add lower.tail and log.p to compute_cdf
a881609
fix: remove log.p and lower.tail from 'random'
35833f3
fix: set ntrys as optional in predict_discrete_helper
54cfbb0
docs: add lower.tail and log.p documentation to posterior_predict
3607f7b
feature: add output 'density', 'quantile' to selected distribution fa…
9b9f986
feat: add outcome support for additional families
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -82,7 +82,7 @@ | |
| posterior_predict.brmsfit <- function( | ||
| object, newdata = NULL, re_formula = NULL, re.form = NULL, | ||
| transform = NULL, resp = NULL, negative_rt = FALSE, | ||
| ndraws = NULL, draw_ids = NULL, sort = FALSE, ntrys = 5, | ||
| ndraws = NULL, draw_ids = NULL, sort = FALSE, ntrys = 5, output = "random", | ||
| cores = NULL, ... | ||
| ) { | ||
| cl <- match.call() | ||
|
|
@@ -97,7 +97,7 @@ posterior_predict.brmsfit <- function( | |
| ) | ||
| posterior_predict( | ||
| prep, transform = transform, sort = sort, ntrys = ntrys, | ||
| negative_rt = negative_rt, cores = cores, summary = FALSE | ||
| negative_rt = negative_rt, cores = cores, summary = FALSE, output = output | ||
| ) | ||
| } | ||
|
|
||
|
|
@@ -136,7 +136,7 @@ posterior_predict.brmsprep <- function(object, transform = NULL, sort = FALSE, | |
| pp_fun <- paste0("posterior_predict_", object$family$fun) | ||
| pp_fun <- get(pp_fun, asNamespace("brms")) | ||
| N <- choose_N(object) | ||
| out <- plapply(seq_len(N), pp_fun, .cores = cores, prep = object, ...) | ||
| out <- plapply(seq_len(N), pp_fun, .cores = cores, prep = object, output = output, ...) | ||
| if (grepl("_mv$", object$family$fun)) { | ||
| out <- do_call(abind, c(out, along = 3)) | ||
| out <- aperm(out, perm = c(1, 3, 2)) | ||
|
|
@@ -303,34 +303,32 @@ validate_pp_method <- function(method) { | |
|
|
||
| # ------------------- family specific posterior_predict methods --------------------- | ||
| # All posterior_predict_<family> functions have the same arguments structure | ||
| # @param i index of the observatio for which to compute pp values | ||
| # @param i index of the observation for which to compute pp values | ||
| # @param prep A named list returned by prepare_predictions containing | ||
| # all required data and posterior draws | ||
| # @param ... ignored arguments | ||
| # @param A vector of length prep$ndraws containing draws | ||
| # from the posterior predictive distribution | ||
| posterior_predict_gaussian <- function(i, prep, ntrys = 5, ...) { | ||
| posterior_predict_gaussian <- function(i, prep, ntrys = 5, output = "random", ...) { | ||
| mu <- get_dpar(prep, "mu", i = i) | ||
| sigma <- get_dpar(prep, "sigma", i = i) | ||
| sigma <- add_sigma_se(sigma, prep, i = i) | ||
| rcontinuous( | ||
| n = prep$ndraws, dist = "norm", | ||
| mean = mu, sd = sigma, | ||
| lb = prep$data$lb[i], ub = prep$data$ub[i], | ||
| ntrys = ntrys | ||
|
|
||
| predict_continuous_helper( | ||
| output = output, prep = prep, i = i, ntrys = ntrys, | ||
| dist = "norm", mean = mu, sd = sigma, ... | ||
| ) | ||
| } | ||
|
|
||
| posterior_predict_student <- function(i, prep, ntrys = 5, ...) { | ||
| posterior_predict_student <- function(i, prep, ntrys = 5, output = "random", ...) { | ||
| nu <- get_dpar(prep, "nu", i = i) | ||
| mu <- get_dpar(prep, "mu", i = i) | ||
| sigma <- get_dpar(prep, "sigma", i = i) | ||
| sigma <- add_sigma_se(sigma, prep, i = i) | ||
| rcontinuous( | ||
| n = prep$ndraws, dist = "student_t", | ||
| df = nu, mu = mu, sigma = sigma, | ||
| lb = prep$data$lb[i], ub = prep$data$ub[i], | ||
| ntrys = ntrys | ||
|
|
||
| predict_continuous_helper( | ||
| output = output, prep = prep, i = i, ntrys = ntrys, | ||
| dist = "student_t", df = nu, mu = mu, sigma = sigma, ... | ||
| ) | ||
| } | ||
|
|
||
|
|
@@ -483,13 +481,13 @@ posterior_predict_student_fcor <- function(i, prep, ...) { | |
| rblapply(seq_len(prep$ndraws), .predict) | ||
| } | ||
|
|
||
| posterior_predict_binomial <- function(i, prep, ntrys = 5, ...) { | ||
| rdiscrete( | ||
| n = prep$ndraws, dist = "binom", | ||
| size = prep$data$trials[i], | ||
| prob = get_dpar(prep, "mu", i = i), | ||
| lb = prep$data$lb[i], ub = prep$data$ub[i], | ||
| ntrys = ntrys | ||
| posterior_predict_binomial <- function(i, prep, ntrys = 5, output = "random", ...) { | ||
| mu = get_dpar(prep, "mu", i = i) | ||
| size = prep$data$trials[i] | ||
|
|
||
| predict_discrete_helper( | ||
| output = output, i = i, prep = prep, ntrys = ntrys, | ||
| dist = "binom", prob = mu, size = size, ... | ||
| ) | ||
| } | ||
|
|
||
|
|
@@ -509,14 +507,15 @@ posterior_predict_bernoulli <- function(i, prep, ...) { | |
| rbinom(length(mu), size = 1, prob = mu) | ||
| } | ||
|
|
||
| posterior_predict_poisson <- function(i, prep, ntrys = 5, ...) { | ||
| posterior_predict_poisson <- function(i, prep, ntrys = 5, output = "random", ...) { | ||
| mu <- get_dpar(prep, "mu", i) | ||
| mu <- multiply_dpar_rate_denom(mu, prep, i = i) | ||
| rdiscrete( | ||
| n = prep$ndraws, dist = "pois", lambda = mu, | ||
| lb = prep$data$lb[i], ub = prep$data$ub[i], | ||
| ntrys = ntrys | ||
|
|
||
| predict_discrete_helper( | ||
| output = output, i = i, prep = prep, ntrys = ntrys, | ||
| dist = "pois", lambda = mu, ... | ||
| ) | ||
|
|
||
| } | ||
|
|
||
| posterior_predict_negbinomial <- function(i, prep, ntrys = 5, ...) { | ||
|
|
@@ -1002,6 +1001,32 @@ rcontinuous <- function(n, dist, ..., lb = NULL, ub = NULL, ntrys = 5) { | |
| out | ||
| } | ||
|
|
||
| pcontinuous <- function(q, dist, ..., lb = NULL, ub = NULL, ntrys = 5) { | ||
| args <- list(...) | ||
| pdist <- paste0("p", dist) | ||
|
|
||
| if (is.null(lb) && is.null(ub)) { | ||
| # non-truncated case | ||
| out <- do_call(pdist, c(list(q), args)) | ||
| } else { | ||
| # truncated case | ||
| F_q <- do_call(pdist, c(list(q), args)) | ||
| F_lb <- do_call(pdist, c(list(lb), args)) | ||
| F_ub <- do_call(pdist, c(list(ub), args)) | ||
|
|
||
| scale_factor <- F_ub - F_lb | ||
|
|
||
| # compute truncated CDF: (F(q) - F(lb)) / (F(ub) - F(lb)) | ||
| out <- dplyr::case_when( | ||
| q < lb ~ 0, | ||
| q > ub ~ 1, | ||
| (F_ub - F_lb) == 0 ~ 1, | ||
| TRUE ~ (F_q - F_lb) / (F_ub - F_lb) | ||
| ) | ||
| } | ||
| out | ||
| } | ||
|
|
||
| # random numbers from (possibly truncated) discrete distributions | ||
| # currently rejection sampling is used for truncated distributions | ||
| # @param n number of random values to generate | ||
|
|
@@ -1034,6 +1059,44 @@ rdiscrete <- function(n, dist, ..., lb = NULL, ub = NULL, ntrys = 5) { | |
| out | ||
| } | ||
|
|
||
| # probability values from (possibly truncated) discrete distributions | ||
| # Note: lb and ub are treated as inclusive in order to be consistent with the | ||
| # behavior of rdiscrete. | ||
| # @param q quantile value(s) for which to compute the CDF | ||
| # @param dist name of a distribution for which the functions | ||
| # @param ... additional arguments passed to the distribution functions | ||
| # @param lb optional lower truncation bound (inclusive) | ||
| # @param ub optional upper truncation bound | ||
| # @param ntrys number of trys in rejection sampling for truncated models | ||
| # @return a vector of probability values | ||
| pdiscrete <- function(q, dist, ..., lb = NULL, ub = NULL, ntrys = 5) { | ||
| args <- list(...) | ||
| pdist <- paste0("p", dist) | ||
|
|
||
| if (is.null(lb) && is.null(ub)) { | ||
| # non-truncated case | ||
| out <- do_call(pdist, c(list(q), args)) | ||
| } else { | ||
| # truncated case (treat lb as inclusive) | ||
| F_q <- do_call(pdist, c(list(q), args)) | ||
| # include (lb - 1) to treat lb as inclusive | ||
| # this is only relevant for discrete distributions | ||
| F_lb <- do_call(pdist, c(list(lb - 1), args)) | ||
| F_ub <- do_call(pdist, c(list(ub), args)) | ||
|
|
||
| scale_factor <- F_ub - F_lb | ||
|
|
||
| # compute truncated CDF: (F(q) - F(lb)) / (F(ub) - F(lb)) | ||
| out <- dplyr::case_when( | ||
| q < lb ~ 0, | ||
| q > ub ~ 1, | ||
| (F_ub - F_lb) == 0 ~ 1, | ||
| TRUE ~ (F_q - F_lb) / (F_ub - F_lb) | ||
| ) | ||
| } | ||
| out | ||
| } | ||
|
|
||
| # sample from the IDs of the mixture components | ||
| sample_mixture_ids <- function(theta) { | ||
| stopifnot(is.matrix(theta)) | ||
|
|
@@ -1085,3 +1148,65 @@ check_discrete_trunc_bounds <- function(x, lb = NULL, ub = NULL, thres = 0.01) { | |
| } | ||
| round(x) | ||
| } | ||
|
|
||
| # predict random numbers or probability values from continuous distributions | ||
| # @param output "probability" or "random" | ||
| # @param prep A named list returned by prepare_predictions containing | ||
| # all required data and posterior draws | ||
| # @param i index of the observation for which to compute pp values | ||
| # @param dist name of the distribution | ||
| # @param ntrys number of trys in rejection sampling for truncated models | ||
| # @param q optional custom quantile value; if NULL, the default is prep$data$Y[i] | ||
| # @param ... additional arguments passed to the distribution functions | ||
| # @return a vector of draws | ||
| predict_continuous_helper <- function(output, prep, i, dist, ntrys, q = NULL, ...) { | ||
| lb <- prep$data$lb[i] | ||
| ub <- prep$data$ub[i] | ||
|
|
||
| switch(output, | ||
| "probability" = { | ||
| if (is.null(q)) { | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you explain this default? or asked differently, when do we want to devidate from that? |
||
| q <- prep$data$Y[i] | ||
| } | ||
| pcontinuous( | ||
| q = q, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ... | ||
| ) | ||
| }, | ||
| "random" = { | ||
|
florence-bockting marked this conversation as resolved.
Outdated
|
||
| rcontinuous( | ||
| n = prep$ndraws, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ... | ||
| ) | ||
| } | ||
| ) | ||
| } | ||
|
|
||
| # predict random numbers or probability values from discrete distributions | ||
| # @param output "probability" or "random" | ||
| # @param prep A named list returned by prepare_predictions containing | ||
| # all required data and posterior draws | ||
| # @param i index of the observation for which to compute pp values | ||
| # @param dist name of the distribution | ||
| # @param ntrys number of trys in rejection sampling for truncated models | ||
| # @param q optional custom quantile value; if NULL, the default is prep$data$Y[i] | ||
| # @param ... additional arguments passed to the distribution functions | ||
| # @return a vector of draws | ||
| predict_discrete_helper <- function(output, prep, i, dist, ntrys, q = NULL, ...) { | ||
| lb <- prep$data$lb[i] | ||
| ub <- prep$data$ub[i] | ||
|
|
||
| switch(output, | ||
|
florence-bockting marked this conversation as resolved.
|
||
| "probability" = { | ||
| if (is.null(q)) { | ||
| q <- prep$data$Y[i] | ||
| } | ||
| pdiscrete( | ||
| q = q, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ... | ||
| ) | ||
| }, | ||
| "random" = { | ||
| rdiscrete( | ||
| n = prep$ndraws, dist = dist, lb = lb, ub = ub, ntrys = ntrys, ... | ||
| ) | ||
| } | ||
| ) | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I am a bit unsure how to treat this correctly. I would have treated the lower bound as exclusive but this would have been inconsistent with the
rdiscretefunction.Thus, I set
lb - 1in order to ensure that the lower bound is inclusive but I was uncertain here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be consistent with how Stan handles it. I am not 100% anymore how it does. Can you double check to match the behavior?