Cédric Gerbelot

 

Briefly

I am a Courant Instructor at the Courant Institute of Mathematical Sciences, NYU. I obtained my PhD at the end of August 2022 at Ecole Normale Supérieure de Paris under the supervision of Florent Krzakala (EPFL) and Marc Lelarge (INRIA, ENS). I work at the interface between theoretical machine learning and mathematical physics. More precisely, I am interested in high-dimensional probability, optimization and mathematical methods inspired by spin glass theory.

Here is a CV.

Contact

  • E-mail: cedric [dot] gerbelot [at] cims [dot] nyu [dot] edu

  • Physical address: Courant Institute, 251 Mercer Street, Office 925

    - New York 10012

Publications and Preprints

You can also find my publications on Google Scholar, and some code on Github.

  • Ben Arous, G., Gerbelot, C., Piccolo, V. High-dimensional optimization for multi-spike tensor PCA, Preprint, 2024.
    [arXiv] [Show Abstract]

    Abstract:

  • Gerbelot, C., Troiani, E., Mignacco, F., Krzakala, F., Zdeborova, L. Rigorous dynamical mean field theory for stochastic gradient descent methods, SIAM Journal on Mathematics of Data Science (SIMODS), 2024.
    [arXiv] [Show Abstract]

    Abstract: We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory (DMFT) equations from statistical physics when applied to gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics, and to include non-separable update functions, allowing datasets with non-identity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch-size and with constant learning rates.

  • Daniels, M., Gerbelot, C., Krzakala, F., Zdeborova, L. Multi-layer State Evolution Under Random Convolutional Design , Advances in Neural Information Processing Systems (Neurips), 2022.
    [arXiv] [Show Abstract]

    Abstract: Signal recovery under generative neural network priors has emerged as a promising direction in statistical inference and computational imaging. Theoretical analysis of reconstruction algorithms under generative priors is, however, challenging. For generative priors with fully connected layers and Gaussian i.i.d. weights, this was achieved by the multi-layer approximate message (ML-AMP) algorithm via a rigorous state evolution. However, practical generative priors are typically convolutional, allowing for computational benefits and inductive biases, and so the Gaussian i.i.d. weight assumption is very limiting. In this paper, we overcome this limitation and establish the state evolution of ML-AMP for random convolutional layers. We prove in particular that random convolutional layers belong to the same universality class as Gaussian matrices. Our proof technique is of an independent interest as it establishes a mapping between convolutional matrices and spatially coupled sensing matrices used in coding theory.

  • Loureiro, B., Gerbelot, C., Refinetti, M, Sicuro, G., Krzakala, F. Fluctuations, Bias, Variance and Ensemble of Learners: Exact Asymptotics for Convex Losses in High-Dimension , International Conference on Machine Learning (ICML), 2022.
    [arXiv] [Show Abstract]

    Abstract: From the sampling of data to the initialisation of parameters, randomness is ubiquitous in modern Machine Learning practice. Understanding the statistical fluctuations engendered by the different sources of randomness in prediction is therefore key to understanding robust generalisation. In this manuscript we develop a quantitative and rigorous theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features in high-dimensions. In particular, we provide a complete description of the asymptotic joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit. Our result encompasses a rich set of classification and regression tasks, such as the lazy regime of overparametrised neural networks, or equivalently the random features approximation of kernels. While allowing to study directly the mitigating effect of ensembling (or bagging) on the bias-variance decomposition of the test error, our analysis also helps disentangle the contribution of statistical fluctuations, and the singular role played by the interpolation threshold that are at the roots of the ``double-descent'' phenomenon.

  • Cornacchia, E., Mignacco, F., Veiga, R., Gerbelot, C., Loureiro, B., Zdeborova, L. Learning Curves for the Multi-class Teacher-student Perceptron , 2022, Machine Learning : Science and Technology.
    [arXiv] [Show Abstract]

    Abstract:One of the most classical results in high-dimensional learning theory provides a closed-form expression for the generalisation error of binary classification with the single-layer teacher-student perceptron on i.i.d. Gaussian inputs. Both Bayes-optimal estimation and empirical risk minimisation (ERM) were extensively analysed for this setting. At the same time, a considerable part of modern machine learning practice concerns multi-class classification. Yet, an analogous analysis for the corresponding multi-class teacher-student perceptron was missing. In this manuscript we fill this gap by deriving and evaluating asymptotic expressions for both the Bayes-optimal and ERM generalisation errors in the high-dimensional regime.

  • Gerbelot, C. and Berthier, R. Graph-based Approximate Message Passing Iterations, Information and Inference : A Journal of the IMA, 2023.
    [arXiv] [Show Abstract]

    Abstract: Approximate-message passing (AMP) algorithms have become an important element of high-dimensional statistical inference, mostly due to their adaptability and concentration properties, the state evolution (SE) equations. This is demonstrated by the growing number of new iterations proposed for increasingly complex problems, ranging from multi-layer inference to low-rank matrix estimation with elaborate priors. In this paper, we address the following questions: is there a structure underlying all AMP iterations that unifies them in a common framework? Can we use such a structure to give a modular proof of state evolution equations, adaptable to new AMP iterations without reproducing each time the full argument ? We propose an answer to both questions, showing that AMP instances can be generically indexed by an oriented graph. This enables to give a unified interpretation of these iterations, independent from the problem they solve, and a way of composing them arbitrarily. We then show that all AMP iterations indexed by such a graph admit rigorous SE equations, extending the reach of previous proofs, and proving a number of recent heuristic derivations of those equations. Our proof naturally includes non-separable functions and we show how existing refinements, such as spatial coupling or matrix-valued variables, can be combined with our framework.

  • Loureiro, B., Sicuro, G., Gerbelot, C.,Pacco, A., Krzakala, F, Zdeborova, L. Learning Gaussian Mixtures with Generalized Linear Models : Precise Asymptotics in High-dimensions, Advances in Neural Information Processing Systems (Neurips) 2021, Spotlight presentation.
    [arXiv] [Show Abstract]

    Abstract: Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of K Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM estimator in high-dimensions, extending several previous results about Gaussian mixture classification in the literature. We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, where we study the efficiency of l_1 penalty with respect to l_2; b) max-margin multi-class classification, where we characterise the phase transition on the existence of the multi-class logistic maximum likelihood estimator for K>2. Finally, we discuss how our theory can be applied beyond the scope of synthetic data, showing that in different cases Gaussian mixtures capture closely the learning curve of classification tasks in real data sets.

  • Loureiro, B., Gerbelot, C, Cui, H, Goldt, S, Mezard, M, Krzakala, F, Zdeborova, L. Capturing the learning curves of realistic data sets with a teacher-student model, Advances in Neural Information Processing Systems (Neurips) 2021.
    [arXiv] [Show Abstract]

    Abstract: Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalisation of the model where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacher-student framework. Our contribution is then two-fold: First, we prove a rigorous formula for the asymptotic training loss and generalisation error. Second, we present a number of situations where the learning curve of the model captures the one of a emph{realisticdata set} learned with kernel regression and classification, with out-of-the-box feature maps such as random projections or scattering transforms, or with pre-learned ones - such as the features learned by training multi-layer neural networks. We discuss both the power and the limitations of the framework.

  • Gerbelot, C., Abbara, A., & Krzakala, F. Asymptotic Errors for Teacher-Student Convex Generalized Linear Models (or : How to Prove Kabashima's Replica Formula), 2020, IEEE Transactions on Information Theory.
    [arXiv] [Show Abstract]

    Abstract: There has been a recent surge of interest in the study of asymptotic reconstruction performance in various cases of generalized linear estimation problems in the teacher-student setting, especially for the case of i.i.d standard normal matrices. Here, we go beyond these matrices, and prove an analytical formula for the reconstruction performance of convex generalized linear models with rotationally-invariant data matrices with arbitrary bounded spectrum, rigorously confirming a conjecture originally derived using the replica method from statistical physics. The formula includes many problems such as compressed sensing or sparse logistic classification. The proof is achieved by leveraging on message passing algorithms and the statistical properties of their iterates, allowing to characterize the asymptotic empirical distribution of the estimator. Our proof is crucially based on the construction of converging sequences of an oracle multi-layer vector approximate message passing algorithm, where the convergence analysis is done by checking the stability of an equivalent dynamical system. We illustrate our claim with numerical examples on mainstream learning methods such as sparse logistic regression and linear support vector classifiers, showing excellent agreement between moderate size simulation and the asymptotic prediction.

  • Gerbelot, C., Abbara, A., & Krzakala, F. Asymptotic errors for convex penalized linear regression beyond Gaussian matrices, Conference On Learning Theory (COLT) 2020. PMLR, vol 125,1682-1713
    [journal, arXiv] [Show Abstract]

    Abstract: We consider the problem of learning a coefficient vector x_0 in {rm I!R}^{N} from noisy linear observations y = Fx_{0}+w in {rm I!R}^{M} in the high dimensional limit M,N to infty with alpha equiv M/N fixed. We provide a rigorous derivation of an explicit formula  — first conjectured using heuristic methods from statistical physics —  for the asymptotic mean squared error obtained by penalized convex regression estimators such as the LASSO or the elastic net, for a class of very generic random matrices corresponding to rotationally invariant data matrices with arbitrary spectrum. The proof is based on a convergence analysis of an oracle version of vector approximate message-passing (oracle-VAMP) and on the properties of its state evolution equations. Our method leverages on and highlights the link between vector approximate message-passing, Douglas-Rachford splitting and proximal descent algorithms, extending previous results obtained with i.i.d. matrices for a large class of problems. We illustrate our results on some concrete examples and show that even though they are asymptotic, our predictions agree remarkably well with numerics even for very moderate sizes.

  • Ilton, M., Couchman, M. M., Gerbelot, C., Benzaquen, M., Fowler, P. D., Stone, H. A., … & Salez, T. Capillary leveling of freestanding liquid nanofilms, 2016, Physical review letters, 117(16), 167801.
    [journal, arXiv] [Show Abstract]

    Abstract: We report on the capillary-driven levelling of a topographical perturbation at the surface of a free-standing liquid nanofilm. The width of a stepped surface profile is found to evolve as the square root of time. The hydrodynamic model is in excellent agreement with the experimental data. In addition to exhibiting an analogy with diffusive processes, this novel system serves as a precise nanoprobe for the rheology of liquids at interfaces in a configuration that avoids substrate effects.

Some recent talks

  • Joint Statistical Meeting (JSM, Portland USA), invited speaker, August 2024

  • EPFL workshop on Machine Learning Theory (Lemanth), invited speaker, May 2024

  • NYU Students and Postdocs Probability seminar, April 2024

  • Harvard Probability and Statistics seminar (Probabilitas seminar series), March 2024

  • Cargese Summer School on Statistical Physics and Machine Learning (invited speaker), 2023

  • Porquerolles Workshop on High Dimensional Statistics and Random Matrices (short talk), 2023

  • Princeton Workshop on Physics for Neural Networks (invited speaker), 2023

  • NYU CDS group seminar, November 2022

  • NYU Courant Postdoc seminar, October 2022

  • Les Houches Summer School on Statistical Physics and Machine Learning, July 2022

  • INRIA/DYOGENE group seminar, April 2022

Reviewing

  • Journal of Statistical Mechanics, Theory and Experiment (JSTAT)

  • IEEE Transactions on Information Theory

  • The Annals of Statistics

  • Information and Inference

  • Journal of Machine Learning Research (JMLR)

  • Advances in Neural Information Processing Systems (Neurips, 2021/2022)

  • International Conference on Machine Learning (ICML, 2022/2023)

Teaching

  • Undergraudate Mathematical Statistics - NYU Courant - Fall 2024

  • Graduate Essentials of Probability - NYU Courant - Spring 2023

  • Graduate Computational Statistics - NYU Courant and Tandon School of Engineering - Fall 2022