Selected Publications (For a full list, see Google Scholar)
23 Publications • 8 Corresponding Author • 3 Awards
2025: 7 • 2024: 4 • 2023: 2 • 2022: 4 • 2021: 2 • 2019: 3 • 2018: 1
* denotes corresponding author.
2025
2025
SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation
Qi Li, Kun Li*, Haozhi Han, Liang Yuan, Junshi Chen, Yunquan Zhang, Yifeng Chen, Hong An, Ting Cao, Mao Yang
SC 2025 🏆 Best Student Paper Award Finalist
2025
From Deep Learning to Deep Science: AI Accelerators Scaling Quantum Chemistry Beyond Limits
Haozhi Han, Kun Li*, Fusong Ju, Qi Li, Yifeng Chen, Yunquan Zhang, Ting Cao, Mao Yang
SC 2025
2025
Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation
Tuowei Wang, Kun Li*, Donglin Bai, Fusong Ju, Leo Xia, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang
To be appeared
2025
JENGA: Enhancing LLM Long-Context Fine-tuning with Contextual Token Sparsity
Tuowei Wang, Xingyu Chen, Kun Li, Ting Cao, Ju Ren, Yaoxue Zhang
ATC 2025
2025
Neuralink: Fast LLM Inference on Smartphones with Neuron Co-Activation Linking
Tuowei Wang, Ruwen Fan, Minxing Huang, Zixu Hao, Kun Li, Ting Cao, Youyou Lu, Yaoxue Zhang, Ju Ren
ASPLOS 2025
2025
FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units
Haozhi Han, Kun Li*, Wei Cui, Donglin Bai, Yiwei Zhang, Liang Yuan, Yifeng Chen, Yunquan Zhang, Ting Cao, Mao Yang
PPoPP 2025
2025
Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers
Yiwei Zhang, Kun Li*, Liang Yuan, Haozhi Han, Yunquan Zhang, Ting Cao, Mao Yang
PPoPP 2025
2024
2024
LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores
Yiwei Zhang, Kun Li*, Liang Yuan, Jiawen Cheng, Yunquan Zhang, Ting Cao, Mao Yang
SC 2024 🏆
Reproducibility Challenge Finalist2024
LONG EXPOSURE: Accelerating Parameter-Efficient Fine-Tuning for LLMs under Shadowy Sparsity
Tuowei Wang, Kun Li*, Zixu Hao, Donglin Bai, Ju Ren, Yaoxue Zhang, Ting Cao, Mao Yang
SC 2024
2024
VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs
Luhan Wang, Haipeng Jia, Lei Xu, Cunyang Wei, Kun Li, Xianmeng Jiang, Yunquan Zhang
IPDPS 2024
2024
ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores
Yuetao Chen, Kun Li*, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang
PPoPP 2024 🏆
Best Paper Award2023
2023
OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs
Tun Chen, Haipeng Jia, Yunquan Zhang, Kun Li, Zhihao Li, Xiang Zhao, Jianyu Yao
ICS 2023
2023
AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3D Parallelization and Leap-Format
Hang Cao, Liang Yuan, He Zhang, Yunquan Zhang, Baodong Wu, Kun Li, Shigang Li, Minghua Zhang, Pengqi Lu, Junmin Xiao
TPDS 2023
2022
2022
EgpuIP: An Embedded GPU Accelerated Library for Image Processing
Luhan Wang, Haipeng Jia, Yunquan Zhang, Kun Li, Cunyang Wei
HPCC 2022
2022
LBBGEMM: A Load-Balanced Batch GEMM Framework on ARM CPUs
Cunyang Wei, Haipeng Jia, Yunquan Zhang, Kun Li, Luhan Wang
HPCC 2022
2022
An Efficient Vectorization Scheme for Stencil Computation
Kun Li, Liang Yuan, Yunquan Zhang, Yue Yue, Hang Cao
IPDPS 2022
2022
An Accurate and Efficient Large-scale Regression Method through Best Friend Clustering
Kun Li, Liang Yuan, Yunquan Zhang, Gongwei Chen
TPDS 2022
2021
2021
Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations
Kun Li, Liang Yuan, Yunquan Zhang, Yue Yue
SC 2021
2021
Temporal Vectorization for Stencils
Liang Yuan, Hang Cao, Yunquan Zhang, Kun Li, Pengqi Lu, Yue Yue
SC 2021
2019
2019
OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight
Kun Li, Honghui Shang, Yunquan Zhang, Shigang Li, Baodong Wu, Dong Wang, Libo Zhang, Fang Li, Dexun Chen, Zhiqiang Wei
SC 2019
2019
swMD: Performance Optimizations for Molecular Dynamics Simulation on Sunway Taihulight
Kun Li, Shigang Li, Bei Wang, Yifeng Chen, Yunquan Zhang
ISPA 2019
2019
FastNBL: fast neighbor lists establishment for molecular dynamics simulation based on bitwise operations
Kun Li, Shigang Li, Shan Huang, Yifeng Chen, Yunquan Zhang
The Journal of Supercomputing (2019)
2018
2018
Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model
Junmin Xiao, Shigang Li, Baodong Wu, He Zhang, Kun Li, Erlin Yao, Yunquan Zhang, Guangming Tan
ICPP 2018