# Research

Probabilistic (Bayesian) approaches to statistics and machine learning have become increasingly popular due to new developments in probabilistic programming languages and associated learning algorithms as well as a steady increase in overall computing power. Probabilistic programming languages make it easier to specify and fit Bayesian models, but this still leaves us with many options regarding constructing, evaluating, and using these models, along with many remaining challenges in computation. My overarching scientific goal is to develop principled Bayesian workflows that comprise the whole scientific process from design of studies, data gathering and cleaning over model building, calibration, fitting and evaluation, to the post-processing and statistical decision making. As such, we are working on a wide range of research topics related to the development, evaluation, implementation, or application of Bayesian methods. Some of my current core research areas are detailed below.

## Uncertainty Quantification

In experiments and observational studies, scientists gather data to learn more about the world. However, what we can learn from a single data set is always limited, and we are inevitably left with some remaining uncertainty. It is of high importance to take this uncertainty into account when drawing conclusions if we want to make real scientific progress. Formalizing and quantifying uncertainty is thus at the heart of statistical methods aiming to obtain insights from data. In my lab, all projects, in one way or the other, deal with uncertainty quantification and propagation, primarily through sampling-based methods.

## Prior Specification

Specification of prior distributions for a Bayesian model is a central part of the Bayesian workflow for data analysis, but it is often difficult even for statistical experts. Prior elicitation transforms domain knowledge of various kinds into well-defined prior distributions, and offers a solution to the prior specification problem, in principle. In practice, however, we are still far from having usable prior elicitation tools that could significantly influence the way we build probabilistic models especially for high-dimensional problems. We are approaching this challenge from two perspectives, (a) by developing intuitive joint prior distributions that yield sensible prior predictions even in high-dimensional spaces and (b) by building prior elicitation tools that transform expert knowledge in the data space into prior distributions on the model parameters that are consistent with that knowledge while satisfying additional probabilistic constraints.

Current projects:

## Amortized Inference

Most Bayesian inference algorithms have to be re-run from scratch for every new dataset or change in prior assumptions, with every run requiring considerable time and computational resources. As a result, Bayesian inference is usually infeasible in situations that require a lot of model re-fits or when results need to be available in real-time. The new field of Amortized Bayesian inference (ABI) offers a path towards solving these challenges. In a nutshell, ABI consists of (1) a training phase where neural networks distill relevant information from any probabilistic model and (2) an inference phase where the networks infer the hidden parameters of the model in real time for any new query. Currently, existing ABI methods only work reliably for relatively simple models and there remain several open challenges regarding the accuracy, scalability, and robustness of these methods; challenges that my lab aims to address in the upcoming years. In the process, we will also bridge the gap between simulation-based and likelihood-based Bayesian inference, thus maximizing the information usable during both training and inference. If you want to learn more about the field of amortized inference, please check out our curated awesome-amortized-inference list of resources and references.

Current projects:

## Latent Variable Modeling

Latent variables are not directly observable, yet they often represent a core part of a scientific theory. For example, psychologists model intelligence and personality, biologists study properties of viruses and bacteria, and economists aim to understand the underlying properties of a market. Statistical methods for modeling latent variables based on manifest (observable) indicators are thus crucial to the scientific progress in those fields. When using modern Bayesian inference approaches, latent variables can be represented as parameters, whose posterior distribution can thus be learned directly from data along with all other model parameters. However, Bayesian latent variable models are also highly challenging to estimate. Not only are they computationally demanding, but they also frequently suffer from convergence issues as well as challenges in choosing the right parameterization and appropriate prior distributions. In our research, we are tackling the specification, estimation, and evaluation of Bayesian latent variable models from various different angles, including amortized and non-amortized approaches.

Current projects:

## Model Comparison

Numerous research questions in basic science are concerned with comparing multiple scientific theories to understand which of them is more likely to be true, or at least closer to the truth. To compare these theories, scientists translate them into statistical models and then investigate how well the models’ predictions match the gathered real-world data. Even if the goal is purely predictive, model comparison is very important for predictive model selection or averaging. In my lab, we are exploring Bayesian model comparison approaches from both theory-driven and predictive perspectives and even seek to find ways to combine both perspectives.

Current projects:

## Machine-Assisted Workflows

Building Bayesian models in a principled way remains a highly complex task requiring a lot of expertise and cognitive resources. Ideally, subject matter experts do not have to solve everything by themselves but have statisticians or data scientists by their side to assist them. Of course, the latter are not always available for every data-analysis project. As a remedy we are developing machine-assisted workflows for building interpretable, robust, and well-predicting Bayesian models. This first requires more research on the theoretical foundations of Bayesian model building. With this in hand, machines will be trained to provide automatic model evaluation and modeling recommendations that guide the user through the model building process. While leaving the modeling choices up to the user, the machine subsequently learns from the user’s decisions to improve its recommendations on the fly.

Current projects: