Research
Research Interests
In my academic research, I have worked broadly on the mathematical and statistical foundations of machine learning and artificial intelligence, with a more recent additional emphasis on addressing real engineering challenges of scaling AI architectures and algorithms.
My current research aims to advance state-of-the-art optimizers and training strategies for scaling and stabilizing language model training in a theoretically principled way, improving both pre-training and post-training scaling and training stability.
In particular, I am interested in
Algorithmic and engineering perspectives of large-scale distributed optimization methods, including stochastic, nonsmooth, nonconvex and/or distributionally robust optimization, with applications to large-scale distributed pre-training of large language models, e.g., efficient optimizers and (data and/or model) parallelism strategies
Theory and applications of optimization and sampling techniques to generative artificial intelligence (GenAI), e.g., efficient (pre-)training strategies of attention-based language and vision models, i.e., large language models (LLMs), vision transformers (ViTs) and multi-modal models (e.g., VLMs)
The interplay between optimization and sampling
High-dimensional statistical inference for modern GenAI
Funding and Grants
Academic Services
Reviewer for
Conferences
NeurIPS 2020, 2021, 2022, 2023, 2024, 2025
ICML 2021, 2022, 2023, 2024, 2025
ICLR 2021, 2022, 2024
AISTATS 2020, 2021, 2022, 2024, 2025
|