add subsampling support for external backends#1870
add subsampling support for external backends#1870vandenman wants to merge 1 commit intopaul-buerkner:masterfrom
Conversation
|
Thank you for opening this PR and for your interest in extending brms! How general would this feature be? You implemented as an example but the goal would be to generalize it to all models that support it? Some kind of subsetting feature is already implemeneted in brms for within-chain parallelization (aka threading). Perhaps some code can be reused. Personally, after seeing the PR and the interfaces, I am a bit hesistant to have this natively in brms because it would add quite a bit of maintenence burden. How complicated would it be to build a thin brms wrapper package instead that supports this algorithm for a selected subset of models (depending on your time to support more models)? |
The subsampling is easiest to do for (general) linear models. I'm planning to also apply this to more advanced models that
I completely understand, and perhaps there is a slight misunderstanding here because I did not intend for The way this currently works, and how I would envision users using this as well, is that they call I'll look into reusing some more of the existing code to minimize the changes this week. In case you believe this would still be too much of a maintenance burden regardless, I completely understand and feel free to close the PR. |
|
I see, makes sense. Still, I would not like brms-main to have partial support for a feature for a small subset of models without this functionality being used directly in brms (but only for an outside package). For the time being, would it make sense to let this run on a brms branch? There you could play around and let users know they need to install this branch. It could either be on your fork of brms or on the official repo (here) but on a separate branch. If you prefer the latter, I can make a new branch here and you can then make the PR against that. |
Hi! I'm working on PDMPSamplersR, an R interface to the Julia package PDMPSamplers.jl, we briefly were in touch via email and I finally found the time to create a clean-ish PR.
PDMPs (Piecewise Deterministic Markov Processes) are continuous-time MCMC algorithms that remain exact when we replace the gradient by an unbiased estimator. A common approach is to take subsamples of the full dataset. This is particularly applicable to GLMs and thus brms because we can replace
yandXby a random subsample and everything still works.From a user perspective, this PR adds a
subsampling()function and adds the argumentsubsampletostancode(). For a mixed-effects model likey ~ x1 + x2 + (1 | group) + (1 | subgroup), the external backend (i.e., PDMPSamplersR) calls:and the generated model block changes from:
model { if (!prior_only) { - vector[N] mu = rep_vector(0.0, N); + vector[pdmp_get_subsample_size()] mu = rep_vector(0.0, pdmp_get_subsample_size()); mu += Intercept; - for (n in 1:N) { + for (n in 1:pdmp_get_subsample_size()) { + int nn = pdmp_get_subsample_index(n); - mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n]; + mu[n] += r_1_1[J_1[nn]] * Z_1_1[nn] + r_2_1[J_2[nn]] * Z_2_1[nn]; } - target += bernoulli_logit_glm_lpmf(Y | Xc, mu, b); + target += bernoulli_logit_glm_lpmf(get_subsampled_Y_int(Y) | get_subsampled_Xc(Xc), mu, b); }The external C++ functions (
pdmp_get_subsample_size,pdmp_get_subsample_index,get_subsampled_Y_int,get_subsampled_Xc) are provided by the backend viastanvarsand compiled into the Stan model (that part worked surprisingly easy, all the necessary stuff was already in brms). While sampling, the Julia code calls these C++ functions to resample the active indices. At first I passed new data to a compiled Stan model via BrideStan, which would not require changes to brms, but that turned out to too slow to be useful, hence this PR. I've seen ideas for subsampling with Stan before on GitHub (e.g., stan-dev/gmo#3), so I tried to write this in a way that it could also benefit others.Here is a small benchmark on a Bernoulli logistic regression (N=500, excluding compilation time):
Fixed effects (
y ~ x1 + ... + x10, m=50):Mixed effects (
y ~ x1 + x2 + (1 | group) + (1 | subgroup), 20 groups, 5 subgroups, m=50):The gradient evaluations for subsampled methods are scaled by m/N to reflect the per observation cost. Furthermore, the ESS for the PDMPs is computed using a continuous time algorithm rather than what rstan does, so take these numbers with a grain of salt. I think they look promising, but I'm not 100% sure yet if it's really a fair comparison.
Reprex code
Note: to run this you will need to install Julia as well.
Further ideas
Not part of this PR, but could reduce coupling between brms and external backends:
empty_fit$fit <- stanfit; brms::rename_pars(empty_fit). Some kind of helper function could make that smoother.Let me know what you think!