News| Publications | Media | Talks | Scholarships | Awards| Projects

 

Kun Li (李琨)
Researcher, Systems and Networking Research Group, Microsoft Research Asia

Research interests:  
large-scale AI4Science system   high-performance parallel algorithm   LLM distribuited training
Links:     [Microsoft Homepage]   [Bilibili]   [WeChat]   [Zhihu]   [Google Scholar]   [ResearchGate]
Contacts:   [kunli [at] microsoft [dot] com]

Brief Biography

  • Dr. Kun Li is currently a Researcher in Systems and Networking Research Group, Microsoft Research Asia (MSRA) since Jul. 2022. His research interests include large-scale AI4Science system, high-performance parallel algorithm, and LLM distribuited training. He has authored featured publications at prestigious international conferences and journals (SC, PPOPP, IPDPS, IEEE TPDS, etc.)
  • He received the Ph.D. degree with the State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) in 2022. The thesis was titled with Reserarch and Application on Multi-level Discontinuous and Nonlinear Scalability for Massively Parallelism, which was awarded with "CCF优秀博士学位论文奖" and "ACM SIGHPC China优秀博士学位论文奖".
  • Now he leads the project Cloud4Science in Microsoft Research. If you are interested in HPC+AI, please contact me for further cooperation (Intern/Visiting scholar/Gap-year student/Part-time collaborator ... ).
  • News

    • [Mar. 2024] Our paper "ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores" wins PPOPP'24 Best Paper Award!
    • [Nov. 2023] Our paper "ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores" is accepted by PPOPP'24. Congratulations to Yuetao!
    • [Jan. 2023] Awarded with 2022 CCF优博奖! [More]
    • [Nov. 2022] Awarded with 2022 ACM SIGHPC China优博奖!

    Publications

      *: Corresponding author.
    • [IPDPS'2024]   Luhan Wang, Haipeng Jia, Lei xu, Cunyang Wei, Kun Li , Xianmeng Jiang, Yunquan Zhang. VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs. 38th IEEE International Parallel & Distributed Processing Symposium, 2024.
    • [PPOPP'2024, [Best Paper Award] ]   Yuetao Chen, Kun Li *, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang. ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2024. [Paper]
    • [ICS'2023]   Tun Chen, Haipeng Jia, Yunquan Zhang, Kun Li, Zhihao Li, Xiang Zhao, Jianyu Yao. OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs. International Conference on Supercomputing, 2023.
    • [TPDS'2023]   Hang Cao, Liang Yuan, He Zhang, Yunquan Zhang, Baodong Wu, Kun Li, Shigang Li, Minghua Zhang, Pengqi Lu, and Junmin Xiao. AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3D Parallelization and Leap-Format. IEEE Transactions on Parallel and Distributed Systems, 2023.
    • [HPCC'2022]   Luhan Wang, Haipeng Jia, Yunquan Zhang, Kun Li, and Cunyang Wei. EgpuIP: An Embedded GPU Accelerated Library for Image Processing. The 24th IEEE International Conferences on High Performance Computing and Communications, 2022.
    • [HPCC'2022]   Cunyang Wei, Haipeng Jia, Yunquan Zhang, Kun Li, and Luhan Wang. LBBGEMM: A Load-Balanced Batch GEMM Framework on ARM CPUs. The 24th IEEE International Conferences on High Performance Computing and Communications, 2022.
    • [IPDPS'2022]   Kun Li, Liang Yuan, Yunquan Zhang, Yue Yue, and Hang Cao. An Efficient Vectorization Scheme for Stencil Computation. The 36th IEEE International Parallel & Distributed Processing Symposium, 2022. [Paper]
    • [TPDS'2022]   Kun Li, Liang Yuan, Yunquan Zhang, and Gongwei Chen. An Accurate and Efficient Large-scale Regression Method through Best Friend Clustering. IEEE Transactions on Parallel and Distributed Systems, 2022. [Paper]
    • [SC'2021]   Kun Li, Liang Yuan, Yunquan Zhang, and Yue Yue. Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2021. [Paper]
    • [SC'2021]   Liang Yuan, Hang Cao, Yunquan Zhang, Kun Li, Pengqi Lu, and Yue Yue. Temporal Vectorization for Stencils. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2021. [Paper]
    • [SC'2019]   Kun Li, Honghui Shang, Yunquan Zhang, Shigang Li, Baodong Wu, Dong Wang, Libo Zhang, Fang Li, Dexun Chen, and Zhiqiang Wei. OpenKMC : a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2019. (Acceptance rate: 22.7%, 78/344) [Paper]
    • [CS'2019]   Dong Wang, Honghui Shang, Yunquan Zhang, Kun Li, Xinfu He, and Lixia Jia. Application of Atomic Dynamics Monte Carlo Program MISA-KMC in the Study of Irradiation Damage of Reactor Pressure Vessel Steel. CCF Computer Science, 2019
    • [ISPA'2019]   Kun Li, Shigang Li, Bei Wang, Yifeng Chen, and Yunquan Zhang. swMD: Performance Optimizations for Molecular Dynamics Simulation on Sunway Taihulight. In 2019 IEEE International Symposium on Parallel & Distributed Processing with Applications, pp. 511-518. IEEE, 2019. [Paper]
    • [JSUPERCOMPUT'2019]   Kun Li, Shigang Li, Shan Huang, Yifeng Chen, and Yunquan Zhang. FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations. The Journal of Supercomputing (2019): 1-20. [Paper]
    • [ICPP'2018]   Junmin Xiao, Shigang Li, Baodong Wu, He Zhang, Kun Li, Erlin Yao, Yunquan Zhang, and Guangming Tan. Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model. In Proceedings of the 47th International Conference on Parallel Processing, 2018. [Paper]
    • [JCST'2017]   Kun Li, Haipeng Jia, Ting Cao, and Yunquan Zhang. The Implementation and Optimization of Multidimensional FFT Algorithm on Large-scale Clusters. The Journal of Frontiers of Computer Science and Technology, 2017. [Paper]
    • [HPCChina'2016]   Kun Li, Yan Li, Ting Cao, Haipeng Jia, and Yunquan Zhang. An MPI-based 3D FFT Implementation on CPUGPU Heterogeneous Clusters. National Annual Conference on High Performance Computing 2016.

    Media

    • Feb.24, 2023. Interviewed by Microsoft Research, 科学匠人 | 李琨:执著于高性能计算研究的“别人家的孩子”. [Microsoft] [Wechat] [Bilibili] [Zhihu] [Tencent]
    • Jan.10, 2023. Interviewed by ICT, CAS, 学术科研 | 计算所两篇论文入选2022年“CCF优秀博士学位论文激励计划”. [ICT] [Wechat]
    • Jul.20, 2022. Interviewed by ICT, CAS, 毕业生故事 | 与你相见,千万次不曾放弃. [Wechat]

    Talks

    Selected Scholarships

    • CAS President Scholarship
    • ICT President Scholarship (Special Prize)
    • National Scholarship for Graduate Students
    • CAS-BHBT Joint Scholarship
    • CAS Outstanding Undergraduate Scholarship
    • UCAS Sugon Scholarship
    • UCAS Academic Scholarship (First Prize)
    • UCAS Outstanding Ph.D. Students Scholarship (First Prize)
    • Huawei Outstanding Cooperation Scholarship
    • ICT CARCH Outstanding Student Scholarship (First Prize)

    Selected Awards

    • Science Craftsman in Microsoft Research Asia
    • 中国计算机学会(CCF)优秀博士学位论文奖
    • 美国计算机协会(ACM) SIGHPC China优秀博士学位论文奖
    • Microsoft Star of Tomorrow
    • CAS Outstanding League Member
    • UCAS Outstanding Communist Member
    • UCAS Merit Student
    • UCAS Excellent Student Cadre
    • ICT Outstanding Volunteer
    • ICT CARCH Excellent Student

    Last updated on 6/30/2023.