Projects
Jan 2020 - Aug 2021
End-to-End Hardware Accelerators for LSTM, CNN, and Attention
Role: Project leader
Built datasets and trained networks for static and dynamic hand gesture recognition as well as speech-word recognition.
Designed FPGA-based accelerators for neural networks and audio/image processing, then applied them in competition settings with top national rankings.
Jan 2023 - Jul 2024
Combining the Systolic Array and Vector Unit
Role: Project leader
Investigated hybrid architectures that combine the strengths of vector-style programmability with systolic-array efficiency.
This line of work led to two DAC 2025 paper submissions.
Jan 2024 - Jun 2024
Heterogeneous SoC Compiler Based on MLIR
Role: Main participant, ranked second
Converted diverse tensor operators into matrix multiplication through tensor contraction and built an MLIR-based compilation chain.
The compiler can schedule kernels to different accelerators to improve multithreaded performance and utilization.
Mar 2024 - Present
Tensor Train Full-Process Accelerator for LLMs
Role: Project leader
Explores Tensor Train decomposition for large language models and reuses previous systolic-array experience to design a dedicated accelerator.
The target architecture accelerates decomposition while supporting vector-matrix multiplication during LLM decoding.
Apr 2024 - Present
Analysis Framework for the Generality of ML Accelerators
Role: Project leader
Studies how spatial architectures such as systolic arrays and Eyeriss-like designs behave across broader operator spaces.
The goal is a tensor-algebra-based framework that identifies effective hardware configurations for arbitrary tensor operator ranges.