Weihao Cui

Xtra Computing Group, National University of Singapore

raphael/prof.JPG

Currently, I am a postdoc research fellow working with Prof. Bingsheng He in National University of Singapore. I also work closely with Prof. Minyi Guo, Prof. Quan Chen and Dr. Han Zhao.

I obtained my Ph.D. degree at Department of Computer Science and Engineering (CSE), Shanghai Jiao Tong University, China, supervised by Prof. Quan Chen on AI System and Cloud Computing.

News

Dec 10, 2025 Two paper accepted to NSDI 2026.
Nov 08, 2025 One paper accepted to HPCA 2026.
Oct 15, 2025 Serving as the Web Chair for ICPP 2026. Submission details are available in the Call for Papers.
Sep 28, 2025 PD-Multiplexing has been merged into SGLang

Selected publications

  1. arXiv
    Optimizing SLO-oriented LLM Serving with PD-Multiplexing
    Weihao Cui, Yukang Chen, Han Zhao, Ziyi Xu, Quan Chen, Xusheng Chen, Yangjie Zhou, Shixuan Sun, and Minyi Guo
    arXiv preprint arXiv:2504.14489, 2025
  2. arXiv
    Efficient Function-as-a-Service for Large Language Models with TIDAL
    Weihao Cui, Ziyi Xu, Han Zhao, Quan Chen, Zijun Li, Bingsheng He, and Minyi Guo
    arXiv preprint arXiv:2503.06421, 2025
  3. NSDI ’26
    Flare: Anomaly diagnostics for divergent llm training in gpu clusters of thousand-plus scale
    Weihao Cui, Ji Zhang, Han Zhao, Chao Liu, Wenhao Zhang, Jian Sha, Bingsheng He, Minyi Guo, and Quan Chen
    In Proceedings of the 23rd USENIX Symposium on Networked Systems Design and Implementation, 2026
  4. OSDI ’23
    Optimizing dynamic neural networks with Brainstorm
    Weihao Cui, Zhenhua Han, Lingji Ouyang, Yichuan Wang, Ningxin Zheng, Lingxiao Ma, Yuqing Yang, Fan Yang, Jilong Xue, Lili Qiu, and 4 more authors
    In 17th USENIX Symposium on Operating Systems Design and Implementation, 2023
  5. ATC ’22
    DVABatch: Diversity-aware Multi-Entry Multi-Exit batching for efficient processing of DNN services on GPUs
    Weihao Cui, Han Zhao, Quan Chen, Hao Wei, Zirui Li, Deze Zeng, Chao Li, and Minyi Guo
    In 2022 USENIX Annual Technical Conference, 2022
  6. SC ’21
    Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction
    Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, Zhuo Song, Tao Ma, Yong Yang, Chao Li, and 1 more author
    In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021