Research
Research Interests
In my academic research, I have worked broadly on the mathematical and statistical foundations of machine learning and artificial intelligence, with a more recent additional emphasis on addressing real engineering challenges of scaling AI architectures and algorithms.
My current research aims to advance state-of-the-art optimizers and training strategies for scaling and stabilizing language model training in a theoretically principled way, improving both pre-training and post-training scaling and training stability.
In particular, I am interested in
Advancing pre-training science and scaling of large deep learning models through algorithmic and engineering perspectives of large-scale distributed stochastic nonsmooth nonconvex optimization methods, including efficient optimizers and parallelism strategies
Theory and applications of optimization and sampling techniques to generative AI (GenAI), e.g., efficient (pre-)training of large language and vision models, mixture-of-expert models, multimodal models and diffusion models
The interplay between optimization and sampling
Funding and Grants
Academic Services
Reviewer for
Conferences
NeurIPS 2020, 2021, 2022, 2023, 2024, 2025, 2026
ICML 2021, 2022, 2023, 2024, 2025
ICLR 2021, 2022, 2024
AISTATS 2020, 2021, 2022, 2024, 2025
|