Skip to content

Fix Bayesian R2 computation for normal and Bernoulli#1880

Open
florence-bockting wants to merge 6 commits intomasterfrom
fix-bayesian-r2
Open

Fix Bayesian R2 computation for normal and Bernoulli#1880
florence-bockting wants to merge 6 commits intomasterfrom
fix-bayesian-r2

Conversation

@florence-bockting
Copy link
Copy Markdown
Collaborator

@florence-bockting florence-bockting commented Apr 18, 2026

Description

This PR refers to Issue#1815.
Update bayes_R2() to use model-based residual variance for supported families (Gaussian and Bernoulli), while preserving fallback behavior (i.e., residual-based R2) for unsupported families.

Note, changes in this PR might be breaking as the output of the plot might change. However, the change in computation is rather considered as a "bug" and this PR as a fix. For safety reasons I run a reverse dependency check for this PR which indicated no issues with dependent packages.

Changes

  • Refactored bayes_R2.brmsfit() to compute R2 via model-based residual variance for:
    • Gaussian models (E[sigma^2] per draw)
    • Bernoulli models (Tjur pseudo-variance: E[pi * (1 - pi)] per draw)
  • Added explicit fallback to residual-based computation for other families.
  • Added a warning when fallback is used, emitted once per family.
  • Split internals into helper functions for clarity:
    • .bayes_R2_res()
    • .bayes_R2_model()
    • .bayes_R2_var_res_gaussian()
    • .bayes_R2_var_res_bernoulli()

TODO

  • update function documentation
  • update tests
  • add release note in NEWS.md
  • run reverse dependency check

@florence-bockting
Copy link
Copy Markdown
Collaborator Author

@avehtari can you check the current implementation?
Also: You mention in the Issue:

for other families, add pseudo-variance when adding pseudo-variance support for R2D2 type priors

This part is not yet implemented as I read it as a task for future implementations but I am not sure whether I misread this point - therefore I want to point this explicitly out here.

@florence-bockting florence-bockting marked this pull request as ready for review April 24, 2026 08:02
@florence-bockting
Copy link
Copy Markdown
Collaborator Author

@avehtari Can you please check the correctness of the computation. Thank you.

Comment thread R/bayes_R2.R Outdated

.bayes_R2_var_res_gaussian <- function(sigma, ...) {
sigma <- as.matrix(sigma)
matrixStats::rowMeans2(sigma^2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why there is rowMeans. The paper and the online appendix provided equations only for scalar sigma, and I think just taking mean for vector sigma in heteroscedastic case is not correct. Even if more correct approach would be too complicated to implement, at least this approximation should be documented.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out. I decided now for the following approach: check whether sigma is heteroscedastic. If true, a warning is provided and R2 computation falls back to residual-based R2.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing this point again, we (Aki, Florence) decided to use also for the heteroscedastic case the mean of sigma as an approximation as this is consistent with the discussion provided in Tjur (2009). A corresponding note is added in the function documentation for the user as well as in the code as a comment for the developer.

To sum up, we use model-based variances for (homo- and heteroscedastic) Gaussian and Bernoulli models. Otherwise residual-based variances are used and the user warned.

Copy link
Copy Markdown
Owner

@paul-buerkner paul-buerkner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. just a few minor comments

Comment thread R/bayes_R2.R
Comment thread NEWS.md
Comment thread R/bayes_R2.R Outdated
}
R2[[i]] <- .bayes_R2(y, ypred)
family_name <- family(object, resp = resp[i])$family
if (family_name %in% "gaussian") {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern works for now. If we later on support the model-based approach for more than two families, we should provide separate functions per family, similar to how we do it with a lot of other post-processing funcftions.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the code and outsourced the family-specific computations for better maintainability and easier future updating. Thank you for pointing this out.

@paul-buerkner
Copy link
Copy Markdown
Owner

paul-buerkner commented Apr 29, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants