Probability Seminar Essen

Summerterm 2022

Most of the talks take place via Zoom at this link

Apr 12

Stefan Ankirchner (University of Jena)

Approximating stochastic gradient descent with diffusions: error expansions and impact of learning rate schedules

Applying a stochastic gradient descent method for minimizing an objective gives rise to a discrete-time process of estimated parameter values. In order to better understand the dynamics of the estimated values it can make sense to approximate the discrete-time process with a continuous-time diffusion. We refine some results on the weak error of diffusion approximations. In particular, we explicitly compute the leading term in the error expansion of an ODE approximation with respect to a parameter discretizing the learning rate schedule. The leading term changes if one extends the ODE with a Brownian diffusion component. Finally, we show that if the learning rate is time varying, then its rate of change needs to enter the drift coefficient in order to obtain an approximation of order 2.
The talk is based on joint work with Stefan Perko.

May 10


May 17

Steffen Dereich (University of Münster)

On the existence of optimal shallow networks

In this talk we discuss existence of global minima in optimisation problems over shallow neural networks. More explicitly, the function class over which we minimise is the family of all functions that can be expressed as artificial neural networks with one hidden layer featuring a specified number of neurons with ReLU (or Leaky ReLU) activation and one linear neuron (without activation function). We give existence results. Moreover, we provide counterexamples that illustrate the relevance of the assumptions imposed in the theorems.
The talk is based on joint work with Arnulf Jentzen (Münster) and Sebastian Kassing (Bielefeld).

May 24

Tobias Werner (University of Kassel)

Deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear partial differential equations

Deep-learning based algorithms are employed in a wide field of real world applications and are nowadays the standard approach for most machine learning related problems. They are used extensively in face and speech recognition, fraud detection, function approximation, and solving of partial differential equations (PDEs). The latter are utilized to model numerous phenomena in nature, medicine, economics, and physics. Often, these PDEs are nonlinear and high-dimensional. For instance, in the famous BlackScholes model the PDE dimension $d \in \mathbb{N}$ corresponds to the number of stocks considered in the model. Relaxing the rather unrealistic assumptions made in the model results in a loss of linearity within the corresponding Black-Scholes PDE.
These PDEs, however, can not in general be solved explicitly and therefore need to be approximated numerically. Classical grid based approaches such as finite difference or finite element methods suffer from the curse of dimensionality in the sense that the computational effort grows exponentially in the PDE dimension $d \in \mathbb{N}$
and are therefore not appropriate. Among the most prominent and promising approaches to solve high-dimensional PDEs are deep-learning algorithms, which seem to handle highdimensional problems well.
However, compared to the vast field of applications these techniques are successfully applied to nowadays,
there exist only few theoretical results proving that deep neural networks do not suffer from the curse of
dimensionality in the numerical approximation of partial differential equations.
I will present a result, stating that deep neural networks are capable of approximating solutions of semilinear Kolmogorov PDE in the case of gradient-independent, Lipschitz-continuous nonlinearities, while the required number of parameters in the networks grow at most polynomially in both dimension
$d \in \mathbb{N}$ and prescribed reciprocal accuracy ". Previously, this has only been proven in the case of semilinear heat equations.
The result is purely deterministic. However, the proof heavily relies on probabilistic tools, in particular
on full history recursive multilevel Picard approximations (MLP).
This talk is based on joint work with Martin Hutzenthaler and Petru A. Cioica-Licht

May 31

Josué Nussbaumer (Université Gustave Eiffel)

Algebraic two-level measure trees

Wolfgang Löhr and Anita Winter introduced algebraic trees which generalize the notion of graph-theoretic trees to potentially uncountable structures. They equipped the space of binary algebraic measure trees with a topology that relies on the Gromov-weak convergence of particular metric representations. They showed that this topology is compact and equivalent to the sample shape convergence on the subspace of binary algebraic measure trees, by encoding the latter with triangulations of the circle. We extended these results to a two-level setup, where algebraic trees are equipped with a probability measure on the set of probability measures. To do so, we encoded algebraic two-level measure trees with triangulations of the circle together with a two-level measure on the circle line. As an application, we constructed the algebraic nested Kingman coalescent.

Jun 07 Katharina Pohl (University Duisburg-Essen)
Jun 14  
Jun 21  
Jul 05 Barbara Rüdiger-Mastandrea (University of Wuppertal)
Jul 12  

Talks of previous terms.


Tuesdays, 16:15–17:15

online or WSC-S-U-3.01

Organizer: Martin Hutzenthaler