Cédric Gerbelot

Briefly

I am an assistant professor (chaire de professeur junior) in the Unité de Mathématiques Pures et Appliquées at Ecole Normale Supérieure de Lyon.

I work at the interface between machine learning theory, probability and mathematical physics. More precisely, I am interested in high-dimensional probability, optimization and mathematical methods inspired by spin glass theory.

I obtained my PhD in 2022 at Ecole Normale Supérieure de Paris under the supervision of Florent Krzakala (EPFL) and Marc Lelarge (INRIA, ENS). I was then a Courant Instructor for two years at the Courant Institute of Mathematical Sciences, NYU, where I worked with Gerard Ben Arous .

Here is a CV.

Contact

E-mail: cedric [dot] gerbelot-barrillon [at] ens-lyon [dot] fr
Physical address: UMPA, ENS Lyon, 46 Allée d'Italie
- 69007 Lyon

Selected Publications

You can also find my publications on Google Scholar, and some code on Github.

Ben Arous, G., Gerbelot, C., Piccolo, V. (alphabetical order) High-dimensional optimization for multi-spike tensor PCA, Preprint, 2024.
[arXiv] [Show Abstract]
Abstract: A series of three papers on the multi-spike tensor estimation in high-dimension using Langevin dynamics, gradient flow and online stochastic gradient descent.

Gerbelot, C., Troiani, E., Mignacco, F., Krzakala, F., Zdeborova, L. Rigorous dynamical mean field theory for stochastic gradient descent methods, SIAM Journal on Mathematics of Data Science (SIMODS), 2024.
[arXiv] [Show Abstract]
Abstract: We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory (DMFT) equations from statistical physics when applied to gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics, and to include non-separable update functions, allowing datasets with non-identity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch-size and with constant learning rates.

Gerbelot, C. and Berthier, R. Graph-based Approximate Message Passing Iterations, Information and Inference : A Journal of the IMA, 2023.
[arXiv] [Show Abstract]
Abstract: Approximate-message passing (AMP) algorithms have become an important element of high-dimensional statistical inference, mostly due to their adaptability and concentration properties, the state evolution (SE) equations. This is demonstrated by the growing number of new iterations proposed for increasingly complex problems, ranging from multi-layer inference to low-rank matrix estimation with elaborate priors. In this paper, we address the following questions: is there a structure underlying all AMP iterations that unifies them in a common framework? Can we use such a structure to give a modular proof of state evolution equations, adaptable to new AMP iterations without reproducing each time the full argument ? We propose an answer to both questions, showing that AMP instances can be generically indexed by an oriented graph. This enables to give a unified interpretation of these iterations, independent from the problem they solve, and a way of composing them arbitrarily. We then show that all AMP iterations indexed by such a graph admit rigorous SE equations, extending the reach of previous proofs, and proving a number of recent heuristic derivations of those equations. Our proof naturally includes non-separable functions and we show how existing refinements, such as spatial coupling or matrix-valued variables, can be combined with our framework.

Loureiro, B., Sicuro, G., Gerbelot, C.,Pacco, A., Krzakala, F, Zdeborova, L. Learning Gaussian Mixtures with Generalized Linear Models : Precise Asymptotics in High-dimensions, Advances in Neural Information Processing Systems (Neurips) 2021, Spotlight presentation.
[arXiv] [Show Abstract]
Abstract: Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM estimator in high-dimensions, extending several previous results about Gaussian mixture classification in the literature. We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, where we study the efficiency of penalty with respect to ; b) max-margin multi-class classification, where we characterise the phase transition on the existence of the multi-class logistic maximum likelihood estimator for . Finally, we discuss how our theory can be applied beyond the scope of synthetic data, showing that in different cases Gaussian mixtures capture closely the learning curve of classification tasks in real data sets.

Loureiro, B., Gerbelot, C, Cui, H, Goldt, S, Mezard, M, Krzakala, F, Zdeborova, L. Capturing the learning curves of realistic data sets with a teacher-student model, Advances in Neural Information Processing Systems (Neurips) 2021.
[arXiv] [Show Abstract]
Abstract: Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalisation of the model where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacher-student framework. Our contribution is then two-fold: First, we prove a rigorous formula for the asymptotic training loss and generalisation error. Second, we present a number of situations where the learning curve of the model captures the one of a emph{realisticdata set} learned with kernel regression and classification, with out-of-the-box feature maps such as random projections or scattering transforms, or with pre-learned ones - such as the features learned by training multi-layer neural networks. We discuss both the power and the limitations of the framework.

Gerbelot, C., Abbara, A., & Krzakala, F. Asymptotic Errors for Teacher-Student Convex Generalized Linear Models (or : How to Prove Kabashima's Replica Formula), 2020, IEEE Transactions on Information Theory.
[arXiv] [Show Abstract]
Abstract: There has been a recent surge of interest in the study of asymptotic reconstruction performance in various cases of generalized linear estimation problems in the teacher-student setting, especially for the case of i.i.d standard normal matrices. Here, we go beyond these matrices, and prove an analytical formula for the reconstruction performance of convex generalized linear models with rotationally-invariant data matrices with arbitrary bounded spectrum, rigorously confirming a conjecture originally derived using the replica method from statistical physics. The formula includes many problems such as compressed sensing or sparse logistic classification. The proof is achieved by leveraging on message passing algorithms and the statistical properties of their iterates, allowing to characterize the asymptotic empirical distribution of the estimator. Our proof is crucially based on the construction of converging sequences of an oracle multi-layer vector approximate message passing algorithm, where the convergence analysis is done by checking the stability of an equivalent dynamical system. We illustrate our claim with numerical examples on mainstream learning methods such as sparse logistic regression and linear support vector classifiers, showing excellent agreement between moderate size simulation and the asymptotic prediction.

Some recent talks

NYU Courant Probability Seminar, October 2024
Joint Statistical Meeting (JSM, Portland USA), invited speaker, August 2024
EPFL workshop on Machine Learning Theory (Lemanth), invited speaker, May 2024
NYU Students and Postdocs Probability seminar, April 2024
Harvard Probability and Statistics seminar (Probabilitas seminar series), March 2024
Cargese Summer School on Statistical Physics and Machine Learning (invited speaker), 2023
Porquerolles Workshop on High Dimensional Statistics and Random Matrices (short talk), 2023
Princeton Workshop on Physics for Neural Networks (invited speaker), 2023
NYU CDS group seminar, November 2022
NYU Courant Postdoc seminar, October 2022
Les Houches Summer School on Statistical Physics and Machine Learning, July 2022
INRIA/DYOGENE group seminar, April 2022

Reviewing

IEEE Transactions on Information Theory
The Annals of Statistics
Information and Inference
Journal of Machine Learning Research (JMLR)
Journal of Statistical Mechanics, Theory and Experiment (JSTAT)
Physical Review E
Advances in Neural Information Processing Systems (Neurips, 2021/2022)
International Conference on Machine Learning (ICML, 2022/2023)

Teaching

Undergraduate Mathematical Statistics - NYU Courant - Fall 2024
Graduate Essentials of Probability - NYU Courant - Spring 2023
Graduate Computational Statistics - NYU Courant and Tandon School of Engineering - Fall 2022