* indicates co-first author, † indicates corresponding author
Full List of Papers
Preprint Papers
- Efficient Function-as-a-Service for Large Language Models with TIDAL . arXiv
- XPUTimer: Anomaly Diagnostics for Divergent LLM Training in GPU Clusters of Thousand-Plus Scale . arXiv
- Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization . arXiv
- The CAP Principle for LLM Serving . arXiv
- Towards Fast Setup and High Throughput of GPU Serverless Computing . arXiv
- A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters . arXiv
No matching items
Conference Papers
- Improving GPU Sharing Performance through Adaptive Bubbleless Spatial-Temporal Sharing . ( EuroSys 2024 ). pdf
- Microless: Cost-Efficient Hybrid Deployment of Microservices on IaaS VMs and Serverless . ( ICPADS 2023 ). pdf
- Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with Combo . ( SoCC 2023 ). pdf
- Optimizing Dynamic Neural Networks with Brainstorm . ( OSDI 2023 ). pdf
- AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs . ( CF 2023 ). pdf
- DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs . ( ATC 2022 ). pdf
- PAME: Precision-Aware Multi-Exit DNN Serving for Reducing Latencies of Batched Inferences . ( ICS 2022 ). pdf
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS . ( HPCA 2022 ). pdf
- Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction . ( SC 2021 ). pdf
- Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks . ( ICCD 2021 ). pdf
- CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs . ( ICDCS 2020 ). pdf
- Ebird: Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services . ( ICCD 2019 ). pdf
- Laius: Towards Latency Awareness and Improved Utilization of Spatial Multitasking Accelerators in Datacenters . ( ICS 2019 ). pdf
No matching items
Journal Papers
- ARACHNE: Optimizing Distributed Parallel Applications with Reduced Inter-Process Communication . ( TACO 2025 ).
- Taming Flexible Job Packing in Deep Learning Training Clusters . ( TACO 2025 ).
- Adaptive Kernel Fusion for Improving the GPU Utilization while Ensuring QoS . ( TC 2024 ). pdf
- Accelerating Sparse DNNs based on Tiled Gemm . ( TC 2024 ). pdf
- Improving Cluster Utilization through Adaptive Resource Management for DNN and CPU Jobs Co-location . ( TC 2023 ). pdf
- ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-grained Resource Management . ( TC 2022 ). pdf
- Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs . ( TC 2021 ). pdf
- E2bird: Enhanced Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services . ( TPDS 2020 ). pdf
No matching items