Selected Publications (For a full list, see Google Scholar)

23 Publications • 8 Corresponding Author • 3 Awards
2025: 7 • 2024: 4 • 2023: 2 • 2022: 4 • 2021: 2 • 2019: 3 • 2018: 1
* denotes corresponding author.

  • 2025

      2025

      SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation

      Qi Li, Kun Li*, Haozhi Han, Liang Yuan, Junshi Chen, Yunquan Zhang, Yifeng Chen, Hong An, Ting Cao, Mao Yang

      SC 2025 🏆 Best Student Paper Award Finalist

      2025

      From Deep Learning to Deep Science: AI Accelerators Scaling Quantum Chemistry Beyond Limits

      Haozhi Han, Kun Li*, Fusong Ju, Qi Li, Yifeng Chen, Yunquan Zhang, Ting Cao, Mao Yang

      SC 2025

      2025

      Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation

      Tuowei Wang, Kun Li*, Donglin Bai, Fusong Ju, Leo Xia, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang

      To be appeared

      2025

      JENGA: Enhancing LLM Long-Context Fine-tuning with Contextual Token Sparsity

      Tuowei Wang, Xingyu Chen, Kun Li, Ting Cao, Ju Ren, Yaoxue Zhang

      ATC 2025

      2025

      Neuralink: Fast LLM Inference on Smartphones with Neuron Co-Activation Linking

      Tuowei Wang, Ruwen Fan, Minxing Huang, Zixu Hao, Kun Li, Ting Cao, Youyou Lu, Yaoxue Zhang, Ju Ren

      ASPLOS 2025

      2025

      FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units

      Haozhi Han, Kun Li*, Wei Cui, Donglin Bai, Yiwei Zhang, Liang Yuan, Yifeng Chen, Yunquan Zhang, Ting Cao, Mao Yang

      PPoPP 2025

      2025

      Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers

      Yiwei Zhang, Kun Li*, Liang Yuan, Haozhi Han, Yunquan Zhang, Ting Cao, Mao Yang

      PPoPP 2025

  • 2024

      2024

      LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores

      Yiwei Zhang, Kun Li*, Liang Yuan, Jiawen Cheng, Yunquan Zhang, Ting Cao, Mao Yang

      SC 2024 🏆 Reproducibility Challenge Finalist

      2024

      LONG EXPOSURE: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity

      Tuowei Wang, Kun Li*, Zixu Hao, Donglin Bai, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang

      SC 2024

      2024

      VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs

      Luhan Wang, Haipeng Jia, Lei Xu, Cunyang Wei, Kun Li, Xianmeng Jiang, Yunquan Zhang

      IPDPS 2024

      2024

      ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores

      Yuetao Chen, Kun Li*, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang

      PPoPP 2024 🏆 Best Paper Award

  • 2023

      2023

      OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs

      Tun Chen, Haipeng Jia, Yunquan Zhang, Kun Li, Zhihao Li, Xiang Zhao, Jianyu Yao

      ICS 2023

      2023

      AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3D Parallelization and Leap-Format

      Hang Cao, Liang Yuan, He Zhang, Yunquan Zhang, Baodong Wu, Kun Li, Shigang Li, Minghua Zhang, Pengqi Lu, Junmin Xiao

      TPDS 2023

  • 2022

      2022

      EgpuIP: An Embedded GPU Accelerated Library for Image Processing

      Luhan Wang, Haipeng Jia, Yunquan Zhang, Kun Li, Cunyang Wei

      HPCC 2022

      2022

      LBBGEMM: A Load-Balanced Batch GEMM Framework on ARM CPUs

      Cunyang Wei, Haipeng Jia, Yunquan Zhang, Kun Li, Luhan Wang

      HPCC 2022

      2022

      An Efficient Vectorization Scheme for Stencil Computation

      Kun Li, Liang Yuan, Yunquan Zhang, Yue Yue, Hang Cao

      IPDPS 2022

      2022

      An Accurate and Efficient Large-scale Regression Method through Best Friend Clustering

      Kun Li, Liang Yuan, Yunquan Zhang, Gongwei Chen

      TPDS 2022

  • 2021

      2021

      Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations

      Kun Li, Liang Yuan, Yunquan Zhang, Yue Yue

      SC 2021

      2021

      Temporal Vectorization for Stencils

      Liang Yuan, Hang Cao, Yunquan Zhang, Kun Li, Pengqi Lu, Yue Yue

      SC 2021

  • 2019

      2019

      OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight

      Kun Li, Honghui Shang, Yunquan Zhang, Shigang Li, Baodong Wu, Dong Wang, Libo Zhang, Fang Li, Dexun Chen, Zhiqiang Wei

      SC 2019

      2019

      swMD: Performance Optimizations for Molecular Dynamics Simulation on Sunway Taihulight

      Kun Li, Shigang Li, Bei Wang, Yifeng Chen, Yunquan Zhang

      ISPA 2019

      2019

      FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations

      Kun Li, Shigang Li, Shan Huang, Yifeng Chen, Yunquan Zhang

      The Journal of Supercomputing (2019)

  • 2018

      2018

      Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model

      Junmin Xiao, Shigang Li, Baodong Wu, He Zhang, Kun Li, Erlin Yao, Yunquan Zhang, Guangming Tan

      ICPP 2018