Zhen Xie's Home Page

I am looking for multiple highly motivated PhD students (Fully Funded) to join our research group starting Fall 2024. Required qualification: A prior master's degree in CS/ECE. Visiting scholars and interns are also welcome.

His research emphasizes on High-Performance Computing (HPC) with a focus on the interaction between machine learning algorithms and system-level performance optimization. His work has been published in multiple top-tier conferences and journals, including PPoPP, SC, ICS, EuroSys, Euro-Par, TPDS, and TACO, and has received ACM Gordon Bell Special Prize in 2022.

Prior, he was a postdoctoral researcher at Argonne National Laboratory and University of California Merced, working on building efficient system supports for HPC and AI/DL workloads on persistent memory and GPU platforms. He obtained his Ph.D. degree at Institute of Computing Technology of the Chinese Academy of Sciences.

For more information, please click here for the Curriculum Vitae

• Performance optimization on HPC and AI/DL applications	• Parallel computing on various architectures
• Heterogeneous computing and memory systems	• Scientific machine learning

[06/2024] A paper "Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor" is accepted into USENIX ATC'24.
[04/2024] A paper "CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2" is accepted into HPDC'24.
[12/2023] A paper "Thorough Characterization and Analysis of Large Transformer Model Training At-Scale" is accepted into ACM SIGMETRICS'24.
[11/2023] Two workshop papers are accepted into SC'23.
[05/2023] A paper "TrainBF: High-Performance DNN Training Engine using BFloat16 on AI Accelerators" is accepted into Euro-Par'23.
[02/2023] A paper "Transfer Learning Across Heterogeneous Features For Efficient Tensor Program Generation" is accepted into ExHET'23.
[01/2023] Have been awarded two Impact Argonne Awards in recognition of the contributions to AI for science for High Performance Computing and Enhancement of Argonne's Reputation.
[11/2022] Our recent work on LLM-based Covid variant prediction models (GenSLMs) has been awarded as Gordon Bell Special Prize at SC'22!!! ACM HPCwire EurekAlert NVIDIA Newswise
[11/2022] A paper "Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness" is accepted into PPoPP'23.
[10/2022] Will serve as a shadow PC member at EuroSys'23 Link
[09/2022] A paper "A Comprehensive Evaluation of Novel AI Accelerators for Deep Learning Workloads" is accepted at the 13th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) at SC'22.
[09/2022] A tutorial on "Programming New AI Accelerators for Scientific Computing" is accepted and will be presented at SC'22 Link
[08/2022] Invited talk on Argonne Training Program on Extreme-Scale Computing (ATPESC 2022) Link Video
[04/2022] Invited talk and service as panellist at Berkeley Lab: "Throughput-oriented and Accuracy-aware DNN Training with BFloat16 on GPU" Link
[03/2022] A paper "Throughput-oriented and Accuracy-aware DNN Training with BFloat16 on GPU" is accepted at IPDPSW'22.
[12/2021] A paper "TLB-pilot: Mitigating TLB Contention Attack on GPUs with Microarchitecture-Aware Scheduling" is accepted by ACM Transactions on Architecture and Code Optimization (TACO).

[USENIX ATC'24] Zhen Xie, Murali Emani, Xiaodong Yu, Dingwen Tao, Xin He, Pengfei Su, Keren Zhou, Venkatram Vishwanath, "Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor." 2024 USENIX Annual Technical Conference. (77/488=15.8%) Paper
[HCW'24] Murali Emani, Sam Foreman, Varuni Sastry, Zhen Xie, Siddhisanket Raskar, William Arnold, Rajeev Thakur, Venkatram Vishwanath, Michael E. Papka, Sanjif Shanmugavelu, Darshan Gandhi, Dun Ma, Kiran Ranganath, Rick Weisner, Jiunn-yeu Chen, Yuting Yang, Natalia Vassilieva, Bin C. Zhang, Sylvia Howland, Alexandar Tsyplikhin, "Toward a Holistic Performance Evaluation of Large Language Models Across Diverse AI Accelerators." Heterogeneity in Computing Workshop at IPDPS'24, 2024. Paper
[SIGMETRICS'24] Scott Cheng, Jun-Liang Lin, Murali Emani, Siddhisanket Raskar, Sam Foreman, Zhen Xie, Venkatram Vishwanath, Mahmut Kandemir, "Thorough Characterization and Analysis of Large Transformer Model Training At-Scale." 50th ACM on Measurement and Analysis of Computing Systems, 2023. (20/118=16.9%) Paper
[arXiv'24] Murali Emani, Sam Foreman, Varuni Sastry, Zhen Xie, Sid Raskar, William Arnold, Rajeev Thakur, Venkatram Vishwanath, Michael E. Papka, "A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators." arXiv. Paper
[MLG-HPCE'23] Xianzhong Ding, Le Chen, Murali Emani, Pei-Hung Lin, Tristan Vanderbruggen, Chunhua Liao, Zhen Xie, Alberto Cerpa, Wan Du, "HPC-GPT: Integrating Large Language Model for High-Performance Computing." Machine Learning with Graphs in High Performance Computing Environments (MLG-HPCE) at SC'23, 2023. Paper
[Euro-Par'23] Zhen Xie, Siddhisanket Raskar, Murali Emani, and Venkatram Vishwanath, "TrainBF: High-Performance DNN Training Engine using BFloat16 on AI Accelerators." 29th International European Conference on Parallel and Distributed Computing, 2023. (49/164=29.8%) Paper
[ExHET'23] Gaurav Verma, Siddhisanket Raskar, Zhen Xie, Abid M. Malik, Murali Emani, and Barbara Chapman, "Transfer Learning Across Heterogeneous Features For Efficient Tensor Program Generation." 2nd International Workshop on Extreme Heterogeneity Solutions, 2023. Paper
[PPoPP'23] Zhen Xie, Jie Liu, Jiajia Li, and Dong Li, "Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness." 27th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), 2023. (31/131=23.7%) Paper
[Gordon Bell'22] Maxim Zvyagin, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, J. Gregory Pauloski, Logan Ward, Valerie Hayot-Sasson, Murali Emani, Sam Foreman, Zhen Xie, Diangen Lin, Maulik Shukla, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan, "GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics." Winner of the ACM Gordon Bell Special Prize for HPC-based Covid-19 research, 2022. Paper
[PMBS'22] Murali Emani, Zhen Xie, Sid Raskar, Varuni Sastry, William Arnold, Bruce Wilson, Rajeev Thakur, Venkatram Vishwanath, Michael E Papka, Cindy Orozco Bohorquez, Rick Weisner, Karen Li, Yongning Sheng, Yun Du, Jian Zhang, Alexander Tsyplikhin, Gurdaman Khaira, Jeremy Fowers, Ramakrishnan Sivakumar, Victoria Godsoe, Adrian Macias, Chetan Tekur, Matthew Boyd, "A Comprehensive Evaluation of Novel AI Accelerators for Deep Learning Workloads." 13th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) at SC, 2022. Paper
[IPDPSW'22] Zhen Xie, Siddhisanket Raskar, and Murali Emani, "Throughput-oriented and Accuracy-aware DNN Training with BFloat16 on GPU." at ScaDL workshop at IPDPS, 2022. Paper
[TACO] Bang Di, Daokun Hu, Zhen Xie, Jianhua Sun, Hao Chen, Jinkui Ren, Dong Li, "TLB-pilot: Mitigating TLB Contention Attack on GPUs with Microarchitecture-Aware Scheduling." ACM Transactions on Architecture and Code Optimization (TACO), 2021. Paper
[SEC'21] Jie Liu, Jiawen Liu, Zhen Xie, and Dong Li, "Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors." ACM/IEEE Symposium on Edge Computing (SEC), 2021. Paper
[TPDS] Zhen Xie, Guangming Tan, Weifeng Liu and Ninghui Sun, "A Pattern Based SpGEMM Library for Multi-core and Many-core Architectures." IEEE Transactions on Parallel and Distributed Systems (TPDS), 2021. Paper
[ICS'21] Zhen Xie, Wenqian Dong, Jie Liu, Ivy Peng, Yanbao Ma, and Dong Li. MD-HM: Memoization-based Molecular Dynamics Simulations on Big Memory System. ACM 35th International Conference on Supercomputing, 2021. (38/157=24.2%) Paper
[ICS'21] Xin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang, and Dong Li. Enabling Energy-Efficient DNN Training on Hybrid GPU-FPGA Accelerators. ACM 35th International Conference on Supercomputing, 2021. (38/157=24.2%) Paper
[EuroSys'21] Zhen Xie, Wenqian Dong, Jiawen Liu, Hang Liu and Dong Li. Tahoe: Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU. ACM 16th European Conference on Computer Systems, 2021. (38/191=19.9%) Paper Slides Video
[SC'20] Wenqian Dong, Zhen Xie, Gokcen Kestor and Dong Li, Smart-PGSim: Using Neural Network to Accelerate AC-OPF Power Grid Simulation. International Conference for High Performance Computing, Networking, Storage and Analysis, 2020. (95/378=25.1%) Paper
[USENIX OpML'20] Jiawen Liu, Zhen Xie, Dimitrios Nikolopoulos, and Dong Li, "RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices", USENIX Conference on Operational Machine Learning, 2020. Paper
[MLSys-W'20] Jiawen Liu, Jie Liu, Zhen Xie, and Dong Li, "Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors", On-Device Intelligence Workshop at Machine Learning and Systems Conference, 2020. Paper
[ICS'19] Zhen Xie, Guangming Tan, Weifeng Liu and Ninghui Sun, "IA-SpGEMM: an Input-aware Auto-tuning Framework for Parallel Sparse Matrix-Matrix Multiplication." ACM 33rd on International Conference on Supercomputing, 2019. (45/193=23.3%) Paper Slides
[SC'19] Wenqian Dong, Jie Liu, Zhen Xie, and Dong Li, "Adaptive neural network-based approximation to accelerate eulerian fluid simulation." International Conference for High Performance Computing, Networking, Storage and Analysis, 2019. (87/344=25.3%) Paper
[ICPADS'16] Zhen Xie, Zheng Cao, Zhan Wang, Dawei Zang, En Shao and Ninghui Sun, "Modeling Traffic of Big Data Platform for Large Scale Datacenter Networks," IEEE 22nd International Conference on Parallel and Distributed Systems, 2016. (123/412=29.9%) Paper
Zhen Xie, Guangming Tan and Ninghui Sun, "PRF : A Process-RAM-Feedback Performance Model to Reveal Bottlenecks and Propose Optimizations." High Technology Letters, 2019. Paper
Zhen Xie, Guangming Tan and Ninghui Sun, Revealing bottlenecks and predicting optimal performance of Sparse Matrix-Vector and Convolution using the Probability-Process-Ram model, Computer Research and Development, 2020. Paper

Reviewers: SC'23, DCAA'23, ICCD'22, LCTES'21, ICS'21, IPDPS'21, IPDPS'20, NPC'20, IPDPS'19, ICPP'19, PPOPP'19, Cluster'19, NPC'19, SC'18, CCGrid'17, etc.
PC members: EuroSys'23, CCGRID'24, LCTES'24, ICPP'24, CCGRID'24
Appointed Journal Reviewers: TPDS, TECS, TACO, JPDC and JHPC.
Student Volunteer: ICS'19, ICS'18, ICPADS'16.

Last updated on February, 2024.