Research

Research Interests

In my academic research, I have worked broadly on the mathematical and statistical foundations of machine learning and artificial intelligence, with a more recent additional emphasis on addressing real engineering challenges of scaling AI architectures and algorithms. My current research aims to advance state-of-the-art optimizers and training strategies for scaling and stabilizing language model training in a theoretically principled way, improving both pre-training and post-training scaling and training stability.

In particular, I am interested in

  • Advancing pre-training science and scaling of large deep learning models through algorithmic and engineering perspectives of large-scale distributed stochastic nonsmooth nonconvex optimization methods, including efficient optimizers and parallelism strategies

  • Theory and applications of optimization and sampling techniques to generative AI (GenAI), e.g., efficient (pre-)training of large language and vision models, mixture-of-expert models, multimodal models and diffusion models

  • The interplay between optimization and sampling

Funding and Grants

  • The University of Chicago Data Science Institute — AI + Science Research Initiative:

    • Project Support Funds (Principal Investigator) with $20,000 equivalent of GPU compute (2024)

    • Project Title: Advancing state-of-the-art large-scale distributed training methods in the era of generative AI

Academic Services

Reviewer for

  • Conferences

    • NeurIPS 2020, 2021, 2022, 2023, 2024, 2025, 2026

    • ICML 2021, 2022, 2023, 2024, 2025

    • ICLR 2021, 2022, 2024

    • AISTATS 2020, 2021, 2022, 2024, 2025