
 Program (Paris time, UTC+1)


Monday 11 January

09:50am10:00am 
 
Welcome message 



Morning chair: Gabriele Steidl 
10:00am10:40am     10:40am11:20am    Ninon Burgos (CNRS, Brain and Spine Institute, ARAMIS Lab, France) Improving the interpretability of computerassisted analysis tools in neuroimaging ( abstract) ( slides) ( talk)  11:20am12:00am    






Lunch break 






Afternoon chair: Michael Möller 
02:00pm02:40pm     02:40pm03:20pm    






Coffee break with virtual croissants 




03:40pm04:20pm     04:20pm05:00pm    Dejan Slepčev (Carnegie Mellon University, USA) Consistency of higherorder derivatives on random point clouds and semisupervized learning ( abstract) ( slides) ( talk) 


Tuesday 12 January



Morning chair: Julie Delon 
10:00am10:40am     10:40am11:20am     11:20am12:00am    






Lunch break 






Afternoon chair: Joachim Weickert 
02:00pm02:40pm     02:40pm03:20pm    






Coffee break with virtual pains au chocolat 




03:40pm04:20pm     04:20pm05:00pm    Soledad Villar (Johns Hopkins Univeristy, USA) Dimensionality reduction, regularization, and generalization in overparameterized regressions ( abstract) ( slides) ( talk) 


Wednesday 13 January



Morning chair: Christoph Schnörr 
10:00am10:40am     10:40am11:20am     






11:20am12:00am    Leon Bungert (FriedrichAlexander University ErlangenNuremberg) Linfinity eigenvalue problems on graphs and their continuum limit ( abstract) ( slides) ( talk) 






Lunch break 






Afternoon chair: Jalal Fadili 
02:00pm02:40pm     02:40pm03:20pm     





Coffee break with virtual chouquettes 




03:40pm04:20pm     04:20pm05:00pm    
 Abstracts 
Francis Bach  On the convergence of gradient descent for wide twolayer neural networks
Many supervised learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many guarantees exist. Models which are nonlinear in their parameters such as neural networks lead to nonconvex optimization problems for which guarantees are harder to obtain. In this talk, I will consider twolayer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived. I will also highlight open problems related to the quantitative behavior of gradient descent for such models. (Joint work with Lénaïc Chizat) Coloma Ballester  Adversarial approaches for inverse problems: from image restoration and colorization to anomaly detection
Generative adversarial approaches are generative models that aim to model the probability distribution of certain data so that we can sample new samples out of the distribution. In this talk, we will discuss them together with its use to tackle several imaging problems. First, a method for image inpainting will be described. It also uses a variational method to incorporate perceptual information in the completion. Then, a method for image colorization that is based on an adversarial strategy that captures geometric, perceptual and semantic information. Finally, an adversarial strategy for anomaly detection will be discussed. Robert Beinert  Bilinear and Quadratic Inverse Problems: NonConvex Regularization, Tensorial Liftings, TensorFree Algorithms
Bilinear and quadratic inverse problems arise in various formulations in imaging and physics like blind deconvolution, deautoconvolution, parallel imaging in MRI, or phase retrieval. Although the corresponding forward operators are only slightly nonlinear, in many instances, the requirements to apply the wellestablished nonlinear regularization theory are not fulfilled. Our central idea to overcome this issue is the universal property, which enables us to lift bilinear/quadratic operators to linear mappings. This immediately giving us access to the linear regularization theory. On the downside, a simple lifting causes an additional nonconvex rankone constraint, which is similarly challenging to handle than the original nonlinear problem. Therefore, we use the tensorial lifting indirectly and generalize the required concepts like subdifferentials and Bregman distances from convex analysis to the new framework. At the same time, we allow the regularization functional to be nonconvex in a manner being comparable with the nonlinearity. Using a sourcewise representation, we derive convergence rates for the regularized solutions with respect to the actual noise level, which has, for instance, interesting consequences for the deautoconvolution. To solve a bilinear/quadratic inverse problem numerically, we adapting the methodology behind PhaseLift, i.e. we lift the problem to the tensor product and relax the nonconvex rankone constraint by the nuclear norm heuristic to obtain a convex minimization problem, which can be solved by any convex minimization method. Unfortunately, a direct implementation is most often impracticable due to the tremendous dimension of the tensor product. For firstorder proximal algorithms like primaldual or forwardbackward splittings, which turn out to be closely related to the singular value thresholding, we can efficiently exploit lowrank representations. On the basis of a subspace iteration or an augmented Lanczos process, the required computations on the tensor product can here be performed completely tensorfree. We apply the proposed method to the twodimensional masked Fourier phase retrieval problem. Joan Bruna Estrach  On DepthSeparation for Neural Networks
Neural networks define functional spaces with appealing behavior in the high dimensional regime, experimentally validated across many applications. However, these spaces are poorly understood from the approximation and optimization point of view, especially as these networks become deeper. In this talk, we will review existing results about approximation and optimization properties for deep networks, highlighting the techniques employed to establish socalled ‘depth separation’ results, that show exponential approximation lower bounds for the class of shallow networks, together with polynomial upper bounds for deeper architectures. We will also present recent extensions that highlight the role of input geometry, such as homogeneity and symmetry, and conclude with a collection of open questions that relate depth separation with scale separation, as well as optimization on deeper models. Claire Boyer  Sampling rates for l1synthesis
This work investigates the problem of signal recovery from undersampled noisy subGaussian measurements under the assumption of a synthesisbased sparsity model. Solving the l1synthesis basis pursuit allows to simultaneously estimate a coefficient representation as well as the soughtfor signal. However, due to linear dependencies within redundant dictionary atoms it might be impossible to identify a specific representation vector, although the actual signal is still successfully recovered. We will study the problem of signal recovery from a nonuniform, signaldependent perspective. By utilizing results from linear inverse problems and convex geometry, we identify the sampling rate describing the phase transition of this problem, and propose a "tight" estimated upperbound. This is extracted from a joint work with Maximilian März (TU Berlin), Jonas Kahn and Pierre Weiss (CNRS, Toulouse). Blanche Buet  Weak and approximate curvatures of a measure: a varifold perspective
We propose a natural framework for the study of surfaces and their different discretizations based on varifolds. Varifolds have been introduced by Almgren to carry out the study of minimal surfaces. Though mainly used in the context of rectifiable sets, they turn out to be well suited to the study of discrete type objects as well. While the structure of varifold is flexible enough to adapt to both regular and discrete objects, it allows to define variational notions of mean curvature and second fundamental form based on the divergence theorem. Thanks to a regularization of these weak formulations, we propose a notion of discrete curvature (actually a family of discrete curvatures associated with a regularization scale) relying only on the varifold structure. We prove nice convergence properties involving a natural growth assumption: the scale of regularization must be large with respect to the accuracy of the discretization. We performed numerical computations of mean curvature and Gaussian curvature on point clouds in R^3 to illustrate this approach. Joint work with: Gian Paolo Leonardi (Trento) and Simon Masnou (Lyon). Leon Bungert  Linfinity eigenvalue problems on graphs and their continuum limit
In the first part of the talk, I discuss nonlinear eigenvalue problems and their relation to gradient flows in some generality. Then I speak about two different Linfinity eigenvalue problems in more detail, show relations to distance functions, and speak about graphbased discretetocontinuum limits for these models using monotone schemes and Gammaconvergence. Ninon Burgos  Improving the interpretability of computerassisted analysis tools in neuroimaging
Neuroimaging offers an unmatched description of the brain’s structure and physiology, which explains its crucial role in the understanding, diagnosis, and treatment of neurological disorders. However, information provided by neuroimages is not always easy to extract and interpret. I will present two approaches to make computerassisted analysis tools more interpretable. The first ones explores how visualisation techniques can improve the interpretability of neural networks used for computeraided diagnosis. The second approach is an anomaly detection framework that generates patientspecific maps summarising the pathology’s distribution in the brain. Elsa Cazelles  Streaming computation of optimal weak transport barycenters
We present an alternative to the standard Wasserstein barycenter problem for probability distributions, based on optimal weak mass transport. The main advantage of our proposal, termed weak barycenter, is that it provides a framework for aggregating (or combining) a set of probability measures for which we can construct an iterative algorithm suitable for streaming distributions, including discrete ones. Moreover, the weak barycenter is defined in a natural way when compared to the Wasserstein barycenter, in particular they both verify a similar fixedpoint equation. We provide a theoretical analysis of the weak barycenter problem and the algorithms to compute it in two settings: (i) for a finite number of measures and (ii) for a population of probability measures distributed according to a given law on the Wasserstein space. Our approach is illustrated on synthetic examples, validated on experiments using 2D realworld data and compared to the classical optimal mass transport setting. Pierre Chainais  DPPs for efficient random subsampling
Subsampling is a very common task of machine learning (e.g. feature selection, regression on a budget) and signal processing (e.g., discretization, interpolation, MonteCarlo methods). We study the potential of random subsampling for quadrature and interpolation. To this aim, we focus on functions living in RKHSs. We use nodes sampled either from a determinantal point process (DPP) or from continuous volume sampling (CVS). A projection DPP is parametrized by a kernel: we use a truncated and saturated version of the RKHS kernel. Thanks to this link between the two kernels, along with DPP machinery, we prove relatively tight bounds on the quadrature error, that depend on the spectrum of the RKHS kernel. We experimentally compare DPPs to existing kernelbased quadratures such as herding, Bayesian quadrature, or leverage score sampling. Numerical results confirm the interest of DPPs, and even suggest faster rates than our bounds in particular cases. Since the quadrature problem and the interpolation problem are closely connected in RKHSs, we also show bounds on the quality of the approximation obtained when using DPPs or CVS for interpolation of functions in an RKHS. These theoretical guarantees motivate an increasing more interest in random subsampling techniques such as DPPs and CVS. Lénaïc Chizat  Faster Wasserstein distance estimation with the Sinkhorn divergence
The squared Wasserstein distance is a natural quantity to compare probability distributions in a nonparametric setting. This quantity is usually estimated with the plugin estimator, defined via a discrete optimal transport problem. It can be solved to εaccuracy by adding an entropic regularization of order ε and using for instance Sinkhorn's algorithm. In this work, we propose instead to estimate it with the Sinkhorn divergence, which is also built on entropic regularization but includes debiasing terms. We show that, for smooth densities, this estimator has a comparable sample complexity but allows higher regularization levels, of order ε^{1/2}, which leads to improved computational complexity bounds and a speedup in practice. We also propose and analyze an estimator based on Richardson extrapolation of the Sinkhorn divergence which enjoys improved statistical and computational efficiency guarantees, under a condition on the regularity of the approximation error, which is in particular satisfied for Gaussian densities. This is joint work with Pierre Roussillon, Flavien Léger, FrançoisXavier Vialard and Gabriel Peyré. Rémi Gribonval  Privacy and utility in compressive statistical learning
In compressive statistical learning, a dataset is summarized into a sketch vector capturing the information needed for a given learning task. Sketches are typically built as empirical averages of random features of the training data, leading to a finite dimensional kernel mean embedding of the empirical probability distribution of the dataset. The talk will discuss how to ensure the utility of such embeddings for a given learning task, and how to make the sketch differentially private. The overall goal is to keep only the information needed to learn, but not more. Matthias Hein  Robust Detection of Outofdistribution data
Current deep neural networks for image recognition do not know when they don't know. Nontask related images are assigned to one of the classes with highconfidence. Thus the machine learning system cannot trigger human intervention or go over into to a safe state when it is applied out of its specification. I will discuss our recent work to tackle this problem via certifiably adversarially robust outofdistribution detection. Michal Irani  “Deep Internal learning”  Deep Learning with Zero Examples
In this talk I will show how complex visual inference tasks can be performed with DeepLearning, in a totally unsupervised way, by training on a single image  the test image itself. The strong recurrence of information inside a single natural image provides powerful internal examples which suffice for selfsupervision of DeepNetworks, without any prior examples or training data. This new paradigm gives rise to true “ZeroShot Learning”. I will show the power of this approach to a variety of problems, including superresolution, imagesegmentation, transparent layer separation, imagedehazing, imageretargeting, and more. I will further show how selfsupervision can be used for “MindReading” from very little fMRI data. Jan Lellmann  Dynamical Optimal Transport and Functional Lifting
Functional lifting methods provide a tool for approximating solutions of difficult nonconvex problems by embedding them into a larger space. One successful strategy is to embed into a space of probability measures, which is then equipped with a suitable metric based on Wasserstein distances. In this talk, we will show how this strategy can be viewed as a generalization of the BenamouBrenier approach to dynamical optimal transport. By modifying the underlying continuity equation, this allows in particular to incorporate functionals depending on higherorder derivatives and with nonscalar domain and range. Adam Oberman  Regularization approaches to deep learning and a Liapunov approach to accelerated SGD
Stefania Petra  A geometric firstorder multilevel method for discrete tomography
In this talk, I will present a geometric multilevel optimization approach choosing as case study discrete tomography. In particular, our method is motivated by variational models that arise as discretization of some underlying infinite dimensional problem. Such problems naturally lead to a hierarchy of models of varying discretization levels. We employ multilevel optimization to take advantage of this hierarchy: while working at the fine level we compute the search direction based on a coarse model. Importing a few concepts of information geometry to our setup we propose a smoothing operator that only uses firstorder information and incorporates constraints smoothly. We show that the proposed algorithm is well suited to the illposed reconstruction problem in discrete tomography and demonstrate its efficiency on several largescale examples. Thomas Pock  Learning optimal discretizations of the total variation
In this work, we study a general framework of discrete approximations ofthe total variation for image reconstruction problems. The framework, forwhich we can show consistency in the sense of Γ–convergence, unifies andextends several existing discretization schemes. In addition, we propose algorithms for learning discretizations of the total variation in order to achievethe best possible reconstruction quality for particular image reconstructiontasks. Interestingly, the learned discretizations significantly differ betweenthe tasks, illustrating that there is no universal best discretization of thetotal variation. Dejan Slepčev  Consistency of higherorder derivatives on random point clouds and applications to semisupervised learning
Silvia Villa  Implicit regularization: from inverse problems to classification
Soledad Villar  Dimensionality reduction, regularization, and generalization in overparameterized regressions
Overparameterization in deep learning has shown to be powerful: very large models can fit the training data perfectly and yet generalize well. Investigation of overparameterization brought back the study of linear models, which, like more complex models, show a “double descent” behavior. This involves two features: (1) The risk (outofsample prediction error) can grow arbitrarily when the number of samples n approaches the number of parameters p (from either side), and (2) the risk decreases with p at p > n, sometimes achieving a lower value than the lowest risk at p < n. The divergence of the risk at p = n is related to the condition number of the empirical covariance in the feature set. For this reason, it can be avoided with regularization. In this work we show that performing a PCAbased dimensionality reduction also avoids the divergence at p = n; we provide a finite upper bound for the variance of the estimator that decreases with p. This result is in contrast to recent work that shows that a different form of dimensionality reduction— one based on the population covariance instead of the empirical covariance—does not avoid the divergence. We connect these results to an analysis of adversarial attacks, which become more effective as they raise the condition number of the empirical covariance of the features. We show that ordinary least squares is arbitrarily susceptible to datapoisoning attacks in the overparameterized regime—unlike the underparameterized regime—and how regularization and dimensionality reduction improve its robustness. We also translated the results on the highly overparameterized linear regression regime to Gaussian Processes.

