My AI Portfolio
Engineering advanced AI systems—from autonomous multi-agent systems and scaling reasoning-focused LLMs on multi-node GPU clusters to performance profiling and distilling DeepSeek R1.
Building intelligent AI agents that dynamically reason, retrieve, and self-correct—from Agentic RAG with colocated vLLM inference to tool-augmented reasoning on the GAIA benchmark.
Key projects:
- Agentic RAG — Agent-based retrieval with iterative query refinement, achieving superior accuracy over Standard RAG and standalone LLMs. Details →
- Colocated vLLM Inference — Zero-egress, GPU-cluster deployment with a three-phase hybrid pipeline that collapses latency by an order of magnitude. Details →
- GAIA Benchmark — Tool-augmented code agent achieving 40% accuracy, outperforming GPT-4’s 14.4%. Details →
Explore all AI Agent projects →
Systematic performance analysis of Transformer architectures—benchmarking FP32 vs. BF16 mixed precision and profiling compute- vs. memory-bound operations in self-attention.
Key projects:
- FP32 vs. BF16 Benchmarking — BF16 mixed precision delivers up to 6× inference throughput and unlocks training of larger architectures that fail under FP32. Details →
- Arithmetic Intensity Profiling — Reveals why MatMul completes in half the time of Softmax despite 25.6× more FLOPs, demonstrating the compute-bound vs. memory-bound paradigm. Details →
Explore all Benchmarking projects →
Advanced post-training and fine-tuning across LLMs and diffusion models—from distilling DeepSeek R1 on multi-node HPC to LoRA-adapted Stable Diffusion.
Key projects:
- DeepSeek R1 Distillation — Boosted Qwen2.5-Math-7B accuracy from 13.3% to 56.7% on AIME 2024 via SFT + GRPO across 8 H100 GPUs. Details →
- Llama 3 Sentiment Analysis — Fine-tuned Llama 3.1–8B achieving 81.49% accuracy on MTEB tweet sentiment. Details →
- Stable Diffusion LoRA — Fine-tuned SD v2 with LoRA for Naruto-style generation, with 77% training time reduction via multi-GPU. Details →
- Bike Traffic Prediction — Graph Attention Networks for urban traffic forecasting; 2nd place at BTW 2023. Details →
- Speaker Identification — Transformer/Conformer encoders achieving 91.8% accuracy. Details →
- Anime Face Generator — Diffusion probabilistic model trained on 71k anime faces. Details →
Explore all Distillation & Fine-Tuning projects →