Project Details
Robust sampling for Bayesian neural networks
Applicants
Professor Dr. Daniel Rudolf; Professorin Dr. Claudia Schillings; Professor Dr. Björn Sprungk
Subject Area
Mathematics
Term
since 2023
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 522337282
The project develops efficient numerical methods for the simulation and evaluation of Bayesian neural networks. This enables uncertainty quantification in deep learning. The uncertainty in the trained network is represented by the posterior distribution for the parameters of the network. This distribution results from conditioning a prior distribution, often a Gaussian one, on the available training data. The posterior distribution thus yields a probability measure on the set of neural networks (with the same architecture). Additionally, it determines a predictive distribution for the output of the network which quantifies the uncertainty in the prediction of the network in practice. Numerical procedures for Bayesian neural networks are given by sampling methods, which aim for the (approximate) simulation of the posterior distribution, as well as by variational approaches, that compute a cheap proxy of it. Here, challenges are the high-dimensionality and high-concentration of the posterior distribution, which result from the sheer size of deep neural networks and strongly informative training data, e. g., due to a big data regime. We develop in the project efficient procedures to evaluate the posterior distribution, in particular, in scenarios of deep neural networks with very informative data. This gives us access to the predictive distribution and thus, enables uncertainty quantification for deep learning. We proceed as follows: 1.) In a first step, we investigate the Laplace approximation of the posterior distribution. In the case of increasing information of the training data as well as in situations where the posterior is high- or infinite-dimensional, we are interested in the speed of convergence of the Laplace approximation to the posterior. This analysis will justify the use of the cheap Laplace proxy for deep neural netwrks in big data regimes. 2.) The Laplace approximation can be considered as a tool to robustify sampling schemes, such as importance sampling, quasi-Monte Carlo as well as Markov chain Monte Carlo, w.r.t. concentration properties of the posterior. Due to the approximative nature of the Laplace proxy, sampling methods can benefit from the implicit information about the posterior given by the Laplace approximation. We will study Laplace based sampling approaches theoretically and, particularly, aim to prove rigorously that these methods are able to circumvent the curse of dimensionality and do not deteriorate with increasing concentration. The theoretical work will be verified and guided by numerical experiments for benchmark problems.
DFG Programme
Research Grants