R/pp_mixture.R
pp_mixture.brmsfit.Rd
Compute the posterior probabilities of mixture component memberships for each observation including uncertainty estimates.
# S3 method for brmsfit
pp_mixture(
x,
newdata = NULL,
re_formula = NULL,
resp = NULL,
ndraws = NULL,
draw_ids = NULL,
log = FALSE,
summary = TRUE,
robust = FALSE,
probs = c(0.025, 0.975),
...
)
pp_mixture(x, ...)
An R object usually of class brmsfit
.
An optional data.frame for which to evaluate predictions. If
NULL
(default), the original data of the model is used.
NA
values within factors are interpreted as if all dummy
variables of this factor are zero. This allows, for instance, to make
predictions of the grand mean when using sum coding.
formula containing group-level effects to be considered in
the prediction. If NULL
(default), include all group-level effects;
if NA
, include no group-level effects.
Optional names of response variables. If specified, predictions are performed only for the specified response variables.
Positive integer indicating how many posterior draws should
be used. If NULL
(the default) all draws are used. Ignored if
draw_ids
is not NULL
.
An integer vector specifying the posterior draws to be used.
If NULL
(the default), all draws are used.
Logical; Indicates whether to return probabilities on the log-scale.
Should summary statistics be returned
instead of the raw values? Default is TRUE
.
If FALSE
(the default) the mean is used as
the measure of central tendency and the standard deviation as
the measure of variability. If TRUE
, the median and the
median absolute deviation (MAD) are applied instead.
Only used if summary
is TRUE
.
The percentiles to be computed by the quantile
function. Only used if summary
is TRUE
.
Further arguments passed to prepare_predictions
that control several aspects of data validation and prediction.
If summary = TRUE
, an N x E x K array,
where N is the number of observations, K is the number
of mixture components, and E is equal to length(probs) + 2
.
If summary = FALSE
, an S x N x K array, where
S is the number of posterior draws.
The returned probabilities can be written as
\(P(Kn = k | Yn)\), that is the posterior probability
that observation n originates from component k.
They are computed using Bayes' Theorem
$$P(Kn = k | Yn) = P(Yn | Kn = k) P(Kn = k) / P(Yn),$$
where \(P(Yn | Kn = k)\) is the (posterior) likelihood
of observation n for component k, \(P(Kn = k)\) is
the (posterior) mixing probability of component k
(i.e. parameter theta<k>
), and
$$P(Yn) = \sum (k=1,...,K) P(Yn | Kn = k) P(Kn = k)$$
is a normalizing constant.
if (FALSE) {
## simulate some data
set.seed(1234)
dat <- data.frame(
y = c(rnorm(100), rnorm(50, 2)),
x = rnorm(150)
)
## fit a simple normal mixture model
mix <- mixture(gaussian, nmix = 2)
prior <- c(
prior(normal(0, 5), Intercept, nlpar = mu1),
prior(normal(0, 5), Intercept, nlpar = mu2),
prior(dirichlet(2, 2), theta)
)
fit1 <- brm(bf(y ~ x), dat, family = mix,
prior = prior, chains = 2, init = 0)
summary(fit1)
## compute the membership probabilities
ppm <- pp_mixture(fit1)
str(ppm)
## extract point estimates for each observation
head(ppm[, 1, ])
## classify every observation according to
## the most likely component
apply(ppm[, 1, ], 1, which.max)
}