CV
Curriculum vitae.
Contact Information
| Name | Jiaming Cheng |
| Professional Title | Efficient ML for AIoT and On-Device LLMs |
| jiaming@jiamingcheng.me |
Research Interests
Efficient on-device LLM inference and compression for edge/AIoT at the algorithm and software level — structured pruning, low-bit quantization, knowledge distillation, and reproducible deployment benchmarks for large language models under tight compute, memory, and energy budgets.
Education
-
2020 - 2024 Columbus, Ohio, USA
Research Experience
-
2024 - present Columbus, Ohio, USA
Researcher (advised by Prof. Rajiv Ramnath and Prof. Brijesh Soni)
The Ohio State University
Efficient ML and model compression for edge/AIoT — vision-model structured pruning and on-device LLM inference — across four papers (two published, one to appear, one under review).
- Designed and implemented the pruning methods in EPIC and SPICE (an L2 extension of DepGraph and a Taylor+L2+KD hybrid, TaLK) as primary code author; trained on the Ohio Supercomputer Center and deployed to Raspberry Pi.
- Built a phase-wise on-device LLM inference benchmark on GPU, CPU, and Raspberry Pi, comparing standard Transformers with a sub-quadratic (Qwen3.5 GatedDeltaNet) architecture.
- Authored the experimental sections and produced the analysis and figures; onboarded new members onto the codebase and edge pipeline.
Publications
-
2025 EPIC: Efficient Pruning for Inference on Constrained Devices
Practice and Experience in Advanced Research Computing (PEARC '25)
-
2026 SPICE: Structured Pruning for Inference on Constrained Edge Devices
IEEE Consumer Communications & Networking Conference (CCNC)
-
2026 Phase-Wise Analysis of LLM Inference Acceleration on GPU, CPU, and Edge Device
Practice and Experience in Advanced Research Computing (PEARC '26), to appear
-
2026 An Empirical Survey of AI Model Compression Techniques for Edge Deployments
IEEE Internet of Things Journal (under review)
Skills
Programming: Python, Rust, TypeScript
ML & Systems: PyTorch, CUDA, llama.cpp, torch-pruning, bitsandbytes, FlashAttention
Research Engineering: uv, Pydantic, Ruff, ty, Weights & Biases, Typer
Infrastructure: Slurm, Docker, Kubernetes, Ceph, Terraform, Ansible