Thesis Topics

In our lab, we have open topics for Bachelor and Master theses throughout the entire academic year. Most of them are part of one of our larger research projects. Below, you can find a short description of some selected topics. If one of the them sounds interesting to you, please contact the person mentioned under Supervision (just click on their name to be forwarded to their profile) and put me in CC. In your email, please don’t forget to provide some basic information about yourself, including your field of study, programming skills, and completed courses that you think may be relevant. If none of the topics below fits your interests, but you still want to write your thesis with us, please contact me directly. We are also open for your own ideas should they be something we can properly supervise.

The influence of base distributions in normalizing flows on method performance

Supervision: Florence Bockting

Project: Simulation-Based Prior Distributions for Bayesian Models

Important References:

Bockting, F., Radev, S. T., & Bürkner, P. C. (2024). Simulation-based prior knowledge elicitation for parametric Bayesian models. Scientific Reports, 14(1), 17330. Link
Kobyzev, I., Prince, S. J., & Brubaker, M. A. (2020). Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11), 3964-3979. Link

Tools: Python, TensorFlow

Problem description: In one of our recent methods, we use normalizing flows to learn a joint prior distribution for the parameters in a Bayesian model. A normalizing flow transforms a simple probability distribution (base distribution) into a complex target distribution (in our case the joint prior). A common choice for the base distribution is a standard Gaussian. We want to investigate different specifications of the base distribution and check their influence on the learned joint prior when compared to a specific ground truth. This project refers to the areas: method development, method implementation.

Project structure:

Understanding the problem (literature work on normalizing flows)
Specify and motivate different base distributions; explain your expectations (methodological work)
Implement and run a simulation study (in Python using TensorFlow)
discuss the results and provide recommendations

Implementation of normalizing flows in Stan

Supervision: Florence Bockting

Project: Simulation-Based Prior Distributions for Bayesian Models

Important References:

Bockting, F., Radev, S. T., & Bürkner, P. C. (2024). Simulation-based prior knowledge elicitation for parametric Bayesian models. Scientific Reports, 14(1), 17330. Link

Tools: Stan, Python, TensorFlow

Problem description: In one of our recent methods, we use normalizing flows (NFs) to learn a joint prior distribution for the parameters in a Bayesian model. The advantage of NFs is that they learn a closed form analytic function that we can use as prior for a Bayesian model. The learning algorithm is implemented in Python with TensorFlow. To make use of the learned joint prior distribution in probabilistic models implemented in Stan, we need an implementation of NFs in Stan. This project refers to the area: method implementation.

Project structure:

Understanding the problem (literature work on normalizing flows; implementation of project in Python)
Conceptual work on a potential implementation in Stan
Implementation in Stan
Run some test examples and discuss results

Variational methods for structural equation models

Supervision: Luna Fazio

Project: Bayesian Distributional Latent Variable Models

Important references:

Tools: R, Stan

Problem description: Bayesian estimation of structural equation models is more flexible than the frequentist counterpart, but can be significantly slower. Variational methods provide fast approximate model fits, but their performance needs to be assessed systematically to determine when they can be reliably used.

Project structure:

Review relevant concepts and literature.
Set up an appropriate computational environment for simulation studies.
Replicate previously published results.
Extend prior work with additional simulations on models and variational algorithms that have not yet been assessed.
Analyze and discuss results.

Lagged modeling using composite Gaussian processes

Supervision: Soham Mukherjee

Project: Probabilistic Models for Single-Cell RNA Sequencing Data

Tools: R, Stan

Relevant literature:

Problem description: Composite Gaussian processes (GPs) allow a natural framework for modeling two or more related data generating processes simultaneously by specifying a joint GP distribution. A specific interesting case would be a composite process of two related GPs expressed through a shared input space. A direct application here would be to model spliced and its time-lagged unspliced RNA expression levels as response using a common cellular ordering as inputs. The primary challenge is to verify if composite GPs are a suitable mathematical framework to model such cases of strictly related data generating processes.

Project structure:

Literature review and understanding of GPs and composite GPs.
Model implementation for composite GPs in Stan.
Simulation studies for model validation.
Applications to example case studies using single-cell data.

Amortized forecasting models

Supervision: Šimon Kucharský

Project: Applications of Amortized Bayesian Inference

Important references:

JANA: Jointly Amortized Neural Approximation of Complex Bayesian Models (and related BayesFlow articles)
Forecasting at scale

Tools: Python, Stan

Problem description: Forecasting with Bayesian models is becoming more popular. Some forecasting applications require making predictions at scale. Traditional Bayesian methods (MCMC) may be too slow for such applications. Amortization can solve this issue as it can be considerable faster during inference (forecasting) than MCMC, while providing the full Bayesian estimate.

Project structure:

Review relevant concepts and literature.
Get familiarized with BayesFlow.
Select a subset of the model features provided by Prophet and implement it with BayesFlow for amortized forecasting.
Validate the model.
Analyze and discuss results.

Amortized hierarchical mixture models

Supervision: Šimon Kucharský Daniel Habermann

Projects:

Applications of Amortized Bayesian Inference Amortized Bayesian Inference for Multilevel Models

Important references:

JANA: Jointly Amortized Neural Approximation of Complex Bayesian Models (and related BayesFlow articles)
Finite Mixture Models (or any other literature on mixture models)
Bayesian Hierarchical Models (or any hierarchical models literature)

Tools: Python, Stan

Problem description: BayesFlow was recently expanded to handle two level hierarchical models (for details, talk to Daniel) and is being expanded to be able to make inferences for mixture models (for details, talk to Simon). The idea for this project is to combine the two approaches and investigate how to implement amortized inference for hierarchical mixture models with BayesFlow.

Project structure:

Review relevant concepts and literature.
Get familiarized with BayesFlow and the current implementation of hierarchical and mixture models.
Implement a basic hierarchical mixture model with BayesFlow as a proof of concept.
Validate the model.
Analyze and discuss results.

Stable diffusion for BayesFlow

Supervision: Lars Kühmichel

Project: BayesFlow: Simulation Intelligence with Deep Learning

Tools: Python, Keras3

Relevant Literature:

Problem Description: BayesFlow is a Python library for simulation-based amortized Bayesian inference with neural networks. It aims to provide users with a rich collection of neural network architectures. Diffusion Models are a modern and powerful type of generative neural network that particularly excel at creating high-quality samples, even for complex, high-dimensional distributions.

Project structure:

Literature search on amortized Bayesian inference and Diffusion Models
Draft your own first implementation of Stable Diffusion
Use your implementation to reproduce a generative deep learning paper
Port your implementation into BayesFlow
Use your ported implementation to reproduce an amortized Bayesian inference paper

Joint prior distributions for hierarchical phylogenetic models

Supervision: Paul Bürkner

Relevant literature:

Estimating Phylogenetic Multilevel Models with brms

Tools: R, Stan

Problem description: Phylogenetic trees are used to represent the evolutionary history between a set of species. The implied dependencies between species can be expressed statistically via phylogenetic models. However, these models are often hard to estimate and may require strong prior knowledge, which we express in Bayesian statistics via prior distributions. The goal of this thesis is to implement and evaluate joint prior distributions for the variance parameters of phylogenetic models. We have concrete ideas for such priors, which we will discuss with you upon starting your thesis.

Project structure:

Literature search on priors for phylogenetic models
Implement the new and existing priors in Stan
Run simulation studies
Analyse and discuss results

Impact of shrinkage priors on strong signals

Supervision: Javier Enrique Aguilar

Project: Intuitive Joint Priors for Bayesian Multilevel Models

Relevant literature:

Sparsity information and regularization in the horseshoe and other shrinkage priors

Tools: R, Stan

Problem description: Continuous global-local shrinkage priors have gained significant traction in Bayesian statistics due to their ability to enhance predictive performance while decreasing bias. In this study, we aim to conduct an analysis of how several well-known shrinkage priors target and shrink strong signals within the data.

Project structure:

Familiarize yourself with shrinkage priors
Characterize the shrinkage on strong signals
Carry out simulations
Analyse and discuss the results