Wen-Chuang Chou

Logo

Passionate about AI with roots in theoretical neuroscience,
I explore LLMs, AI agent, and human mobility—driven by a vision of accessible, everyday AI for all.

View My GitHub Profile

LLM Distillation & Fine-Tuning

← Back to Portfolio


Distilling DeepSeek R1 for Enhanced LLM Performance

This project showcases a successful methodology for significantly enhancing large language model performance through advanced post-training in a distributed HPC environment.

Brief Technique & Impact

This work focused on post-training weaker LLMs by fine-tuning the Qwen2.5 model using high-quality data distilled from DeepSeek R1. Employing Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) techniques across 8 H100 GPUs distributed over 2 HPC nodes, a key aspect of this project involved learning, optimizing, and simplifying the deployment of the training workflow for common HPC setups.

Performance Highlights

The rigorous post-training process yielded substantial gains, boosting the Qwen2.5-Math-7B-Instruct model’s pass@1 accuracy on the AIME 2024 benchmark from 13.3% to a remarkable 56.7%, and on GPQA Diamond from 28.3% to 54.5%. This demonstrates the effectiveness of the distilled data approach in bringing weaker LLMs closer to DeepSeek R1’s performance.

Model AIME 2024
pass@1
MATH-500
pass@1
GPQA Diamond
pass@1
Qwen2.5-Math-7B-Instruct
(Original)
13.3 80.2 28.3
Qwen2.5-Math-7B-Instruct
(Fine-tuned on DeepSeek R1 distilled data)
56.7 89.8 54.5
DeepSeek-R1-Distill-Qwen-7B (Teacher) 53.3 93.2 53.0

More details can be found in the project repository on GitHub.


Fine-Tuning Llama 3 for Sentiment Analysis

This project fine-tunes Llama 3.1–8B Instruct to perform sentiment classification on short-form text, such as tweets. The model learns to identify sentiments—positive, neutral, or negative—using the tweet_sentiment_extraction subset from the MTEB benchmark.

Technique & Impact

Using instruction-style prompts and a streamlined training pipeline, the model was fine-tuned to produce accurate, single-word sentiment predictions. This significantly improved performance and makes the model well-suited for real-world applications like social media monitoring and customer feedback analysis.

Performance Highlights

On the MTEB tweet sentiment test set, the fine-tuned model achieved a notable accuracy gain:

Accuracy on MTEB Tweet Sentiment Classification

Model Accuracy (%)
Llama 3.1–8B (zero-shot) 63.41
Llama 3.1–8B (fine-tuned) 81.49

Radar plot showing model performance

More details can be found in the project repository on GitHub.