Cédric Gerbelot

Briefly
I am a Courant Instructor at the Courant Institute of Mathematical Sciences, NYU. I obtained my PhD at the end of August 2022 at Ecole Normale Supérieure de Paris under the supervision of Florent Krzakala (EPFL) and Marc Lelarge (INRIA, ENS). I work at the interface between theoretical machine learning and mathematical physics. More precisely, I am interested in highdimensional probability, optimization and mathematical methods inspired by spin glass theory.
Here is a CV.
Contact
Email: cedric [dot] gerbelot [at] cims [dot] nyu [dot] edu
Physical address: Courant Institute, 251 Mercer Street, Office 925  New York 10012

Publications and Preprints
You can also find my publications on Google Scholar, and some code on Github.
Ben Arous, G., Gerbelot, C., Piccolo, V. Highdimensional optimization for multispike tensor PCA, Preprint, 2024. [arXiv] [Show Abstract]
Abstract:
Gerbelot, C., Troiani, E., Mignacco, F., Krzakala, F., Zdeborova, L. Rigorous dynamical mean field theory for stochastic gradient descent methods, SIAM Journal on Mathematics of Data Science (SIMODS), 2024. [arXiv] [Show Abstract]
Abstract: We prove closedform equations for the exact highdimensional asymptotics of a family of first order gradientbased methods, learning an estimator (e.g. Mestimator, shallow neural network, ...)
from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical meanfield theory (DMFT) equations from statistical physics when
applied to gradient flow. Our proof method allows us to
give an explicit description of how memory kernels build up in the effective dynamics, and to include nonseparable update functions, allowing datasets with nonidentity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batchsize and with constant learning rates.
Daniels, M., Gerbelot, C., Krzakala, F., Zdeborova, L. Multilayer State Evolution Under Random Convolutional Design , Advances in Neural Information Processing Systems (Neurips), 2022. [arXiv] [Show Abstract]
Abstract: Signal recovery under generative neural network priors has emerged as a promising direction in statistical inference and computational imaging. Theoretical analysis of reconstruction algorithms under generative priors is, however, challenging. For generative priors with fully connected layers and Gaussian i.i.d. weights, this was achieved by the multilayer approximate message (MLAMP) algorithm via a rigorous state evolution. However, practical generative priors are typically convolutional, allowing for computational benefits and inductive biases, and so the Gaussian i.i.d. weight assumption is very limiting. In this paper, we overcome this limitation and establish the state evolution of MLAMP for random convolutional layers. We prove in particular that random convolutional layers belong to the same universality class as Gaussian matrices. Our proof technique is of an independent interest as it establishes a mapping between convolutional matrices and spatially coupled sensing matrices used in coding theory.
Loureiro, B., Gerbelot, C., Refinetti, M, Sicuro, G., Krzakala, F. Fluctuations, Bias, Variance and Ensemble of Learners: Exact Asymptotics for Convex Losses in HighDimension , International Conference on Machine Learning (ICML), 2022. [arXiv] [Show Abstract]
Abstract: From the sampling of data to the initialisation of parameters, randomness is ubiquitous in modern Machine Learning practice. Understanding the statistical fluctuations engendered by the different sources of randomness in prediction is therefore key to understanding robust generalisation. In this manuscript we develop a quantitative and rigorous theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features in highdimensions. In particular, we provide a complete description of the asymptotic joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the highdimensional limit. Our result encompasses a rich set of classification and regression tasks, such as the lazy regime of overparametrised neural networks, or equivalently the random features approximation of kernels. While allowing to study directly the mitigating effect of ensembling (or bagging) on the biasvariance decomposition of the test error, our analysis also helps disentangle the contribution of statistical fluctuations, and the singular role played by the interpolation threshold
that are at the roots of the ``doubledescent'' phenomenon.
Cornacchia, E., Mignacco, F., Veiga, R., Gerbelot, C., Loureiro, B., Zdeborova, L. Learning Curves for the Multiclass Teacherstudent Perceptron , 2022, Machine Learning : Science and Technology. [arXiv] [Show Abstract]
Abstract:One of the most classical results in highdimensional learning theory provides a closedform expression for the generalisation error of binary classification with the singlelayer teacherstudent perceptron on i.i.d. Gaussian inputs. Both Bayesoptimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machine learning practice concerns multiclass classification. Yet, an analogous analysis for the corresponding multiclass teacherstudent perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for both the Bayesoptimal and ERM generalisation errors in the highdimensional regime.
Gerbelot, C. and Berthier, R. Graphbased Approximate Message Passing Iterations, Information and Inference : A Journal of the IMA, 2023. [arXiv] [Show Abstract]
Abstract: Approximatemessage passing (AMP) algorithms have become an important element of highdimensional statistical inference, mostly due to their adaptability and concentration properties, the state evolution (SE) equations. This is demonstrated by the growing number of new iterations proposed for increasingly complex problems, ranging from multilayer inference to lowrank matrix estimation with elaborate priors. In this paper, we address the following questions: is there a structure underlying all AMP iterations that unifies them in a common framework? Can we use such a structure to give a modular proof of state evolution equations, adaptable to new AMP iterations without reproducing each time the full argument ? We propose an answer to both questions, showing that AMP instances can be generically indexed by an oriented graph. This enables to give a unified interpretation of these iterations, independent from the problem they solve, and a way of composing them arbitrarily. We then show that all AMP iterations indexed by such a graph admit rigorous SE equations, extending the reach of previous proofs, and proving a number of recent heuristic derivations of those equations. Our proof naturally includes nonseparable functions and we show how existing refinements, such as spatial coupling or matrixvalued variables, can be combined with our framework.
Loureiro, B., Sicuro, G., Gerbelot, C.,Pacco, A., Krzakala, F, Zdeborova, L. Learning Gaussian Mixtures with Generalized Linear Models : Precise Asymptotics in Highdimensions, Advances in Neural Information Processing Systems (Neurips) 2021, Spotlight presentation. [arXiv] [Show Abstract]
Abstract: Generalised linear models for multiclass classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM estimator in highdimensions, extending several previous results about Gaussian mixture classification in the literature. We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, where we study the efficiency of penalty with respect to ; b) maxmargin multiclass classification, where we characterise the phase transition on the existence of the multiclass logistic maximum likelihood estimator for . Finally, we discuss how our theory can be applied beyond the scope of synthetic data, showing that in different cases Gaussian mixtures capture closely the learning curve of classification tasks in real data sets.
Loureiro, B., Gerbelot, C, Cui, H, Goldt, S, Mezard, M, Krzakala, F, Zdeborova, L. Capturing the learning curves of realistic data sets with a teacherstudent model, Advances in Neural Information Processing Systems (Neurips) 2021. [arXiv] [Show Abstract]
Abstract: Teacherstudent models provide a framework in which the typicalcase
performance of highdimensional supervised learning can be described in
closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacherstudent model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate
generalisation of the model where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacherstudent framework. Our contribution is then twofold: First, we prove a rigorous formula for the asymptotic training loss and generalisation error. Second, we present a number of situations where the learning curve of the model captures the one of a emph{realisticdata set} learned with kernel regression and classification, with outofthebox feature maps such as random projections or scattering transforms, or with prelearned ones  such as the features learned by training multilayer neural networks. We discuss both the power and the limitations of the framework.
Gerbelot, C., Abbara, A., & Krzakala, F. Asymptotic Errors for TeacherStudent Convex Generalized Linear Models (or : How to Prove Kabashima's Replica Formula), 2020, IEEE Transactions on Information Theory. [arXiv] [Show Abstract]
Abstract: There has been a recent surge of interest in the study of asymptotic reconstruction performance in various cases of generalized linear estimation problems in the teacherstudent setting, especially for the case of i.i.d standard normal matrices. Here, we go beyond these matrices, and prove an analytical formula for the reconstruction performance of convex generalized linear models with rotationallyinvariant data matrices with arbitrary bounded spectrum, rigorously confirming a conjecture originally derived using the replica method from statistical physics. The formula includes many problems such as compressed sensing or sparse logistic classification. The proof is achieved by leveraging on message passing algorithms and the statistical properties of their iterates, allowing to characterize the asymptotic empirical distribution of the estimator. Our proof is crucially based on the construction of converging sequences of an oracle multilayer vector approximate message passing algorithm, where the convergence analysis is done by checking the stability of an equivalent dynamical system. We illustrate our claim with numerical examples on mainstream learning methods such as sparse logistic regression and linear support vector classifiers, showing excellent agreement between moderate size simulation and the asymptotic prediction.
Gerbelot, C., Abbara, A., & Krzakala, F. Asymptotic errors for convex penalized linear regression beyond Gaussian matrices, Conference On Learning Theory (COLT) 2020. PMLR, vol 125,16821713 [journal, arXiv] [Show Abstract]
Abstract: We consider the problem of learning a coefficient vector from noisy linear observations in the high dimensional limit with fixed. We provide a rigorous derivation of an explicit formula — first conjectured using heuristic methods from statistical physics — for the asymptotic mean squared error obtained by penalized convex regression estimators such as the LASSO or the elastic net, for a class of very generic random matrices corresponding to rotationally invariant data matrices with arbitrary spectrum. The proof is based on a convergence analysis of an oracle version of vector approximate messagepassing (oracleVAMP) and on the properties of its state evolution equations. Our method leverages on and highlights the link between vector approximate messagepassing, DouglasRachford splitting and proximal descent algorithms, extending previous results obtained with i.i.d. matrices for a large class of problems. We illustrate our results on some concrete examples and show that even though they are asymptotic, our predictions agree remarkably well with numerics even for very moderate sizes.
Ilton, M., Couchman, M. M., Gerbelot, C., Benzaquen, M., Fowler, P. D., Stone, H. A., … & Salez, T. Capillary leveling of freestanding liquid nanofilms, 2016, Physical review letters, 117(16), 167801. [journal, arXiv] [Show Abstract]
Abstract: We report on the capillarydriven levelling of a topographical perturbation at the surface of a freestanding liquid nanofilm. The width of a stepped surface profile is found to evolve as the square root of time. The hydrodynamic model is in excellent agreement with the experimental data. In addition to exhibiting an analogy with diffusive processes, this novel system serves as a precise nanoprobe for the rheology of liquids at interfaces in a configuration that avoids substrate effects.
Some recent talks
Joint Statistical Meeting (JSM, Portland USA), invited speaker, August 2024
EPFL workshop on Machine Learning Theory (Lemanth), invited speaker, May 2024
NYU Students and Postdocs Probability seminar, April 2024
Harvard Probability and Statistics seminar (Probabilitas seminar series), March 2024
Cargese Summer School on Statistical Physics and Machine Learning (invited speaker), 2023
Porquerolles Workshop on High Dimensional Statistics and Random Matrices (short talk), 2023
Princeton Workshop on Physics for Neural Networks (invited speaker), 2023
NYU CDS group seminar, November 2022
NYU Courant Postdoc seminar, October 2022
Les Houches Summer School on Statistical Physics and Machine Learning, July 2022
INRIA/DYOGENE group seminar, April 2022
Reviewing
Journal of Statistical Mechanics, Theory and Experiment (JSTAT)
IEEE Transactions on Information Theory
The Annals of Statistics
Information and Inference
Journal of Machine Learning Research (JMLR)
Advances in Neural Information Processing Systems (Neurips, 2021/2022)
International Conference on Machine Learning (ICML, 2022/2023)
Teaching
Undergraudate Mathematical Statistics  NYU Courant  Fall 2024
Graduate Essentials of Probability  NYU Courant  Spring 2023
Graduate Computational Statistics  NYU Courant and Tandon School of Engineering  Fall 2022
