Use threads for within-chain parallelization in Stan via the brms
interface. Within-chain parallelization is experimental! We recommend its use
only if you are experienced with Stan's reduce_sum
function and have a
slow running model that cannot be sped up by any other means.
threading(threads = NULL, grainsize = NULL, static = FALSE)
Number of threads to use in within-chain parallelization.
Number of observations evaluated together in one chunk on
one of the CPUs used for threading. If NULL
(the default),
grainsize
is currently chosen as max(100, N / (2 *
threads))
, where N
is the number of observations in the data. This
default is experimental and may change in the future without prior notice.
Logical. Apply the static (non-adaptive) version of
reduce_sum
? Defaults to FALSE
. Setting it to TRUE
is required to achieve exact reproducibility of the model results
(if the random seed is set as well).
A brmsthreads
object which can be passed to the
threads
argument of brm
and related functions.
The adaptive scheduling procedure used by reduce_sum
will
prevent the results to be exactly reproducible even if you set the random
seed. If you need exact reproducibility, you have to set argument
static = TRUE
which may reduce efficiency a bit.
To ensure that chunks (whose size is defined by grainsize
) require
roughly the same amount of computing time, we recommend storing
observations in random order in the data. At least, please avoid sorting
observations after the response values. This is because the latter often
cause variations in the computing time of the pointwise log-likelihood,
which makes up a big part of the parallelized code.
if (FALSE) {
# this model just serves as an illustration
# threading may not actually speed things up here
fit <- brm(count ~ zAge + zBase * Trt + (1|patient),
data = epilepsy, family = negbinomial(),
chains = 1, threads = threading(2, grainsize = 100),
backend = "cmdstanr")
summary(fit)
}