Jan 2020 - Aug 2021

End-to-End Hardware Accelerators for LSTM, CNN, and Attention

Role: Project leader

Built datasets and trained networks for static and dynamic hand gesture recognition as well as speech-word recognition.

Designed FPGA-based accelerators for neural networks and audio/image processing, then applied them in competition settings with top national rankings.

Jan 2023 - Jul 2024

Combining the Systolic Array and Vector Unit

Role: Project leader

Investigated hybrid architectures that combine the strengths of vector-style programmability with systolic-array efficiency.

This line of work led to two DAC 2025 paper submissions.

Jan 2024 - Jun 2024

Heterogeneous SoC Compiler Based on MLIR

Role: Main participant, ranked second

Converted diverse tensor operators into matrix multiplication through tensor contraction and built an MLIR-based compilation chain.

The compiler can schedule kernels to different accelerators to improve multithreaded performance and utilization.

Mar 2024 - Present

Tensor Train Full-Process Accelerator for LLMs

Role: Project leader

Explores Tensor Train decomposition for large language models and reuses previous systolic-array experience to design a dedicated accelerator.

The target architecture accelerates decomposition while supporting vector-matrix multiplication during LLM decoding.

Apr 2024 - Present

Analysis Framework for the Generality of ML Accelerators

Role: Project leader

Studies how spatial architectures such as systolic arrays and Eyeriss-like designs behave across broader operator spaces.

The goal is a tensor-algebra-based framework that identifies effective hardware configurations for arbitrary tensor operator ranges.