Passionate about AI with roots in theoretical neuroscience,
I explore LLMs, AI agent, and human mobility—driven by a vision of accessible, everyday AI for all.
This project showcases a successful methodology for significantly enhancing large language model performance through advanced post-training in a distributed HPC environment.
This work focused on post-training weaker LLMs by fine-tuning the Qwen2.5 model using high-quality data distilled from DeepSeek R1. Employing Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) techniques across 8 H100 GPUs distributed over 2 HPC nodes, a key aspect of this project involved learning, optimizing, and simplifying the deployment of the training workflow for common HPC setups.
The rigorous post-training process yielded substantial gains, boosting the Qwen2.5-Math-7B-Instruct model’s pass@1 accuracy on the AIME 2024 benchmark from 13.3% to a remarkable 56.7%, and on GPQA Diamond from 28.3% to 54.5%. This demonstrates the effectiveness of the distilled data approach in bringing weaker LLMs closer to DeepSeek R1’s performance.
| Model | AIME 2024 pass@1 |
MATH-500 pass@1 |
GPQA Diamond pass@1 |
|---|---|---|---|
| Qwen2.5-Math-7B-Instruct (Original) |
13.3 | 80.2 | 28.3 |
| Qwen2.5-Math-7B-Instruct (Fine-tuned on DeepSeek R1 distilled data) |
56.7 | 89.8 | 54.5 |
| DeepSeek-R1-Distill-Qwen-7B (Teacher) | 53.3 | 93.2 | 53.0 |
More details can be found in the project repository on GitHub.
This project fine-tunes Llama 3.1–8B Instruct to perform sentiment classification on short-form text, such as tweets. The model learns to identify sentiments—positive, neutral, or negative—using the tweet_sentiment_extraction subset from the MTEB benchmark.
Using instruction-style prompts and a streamlined training pipeline, the model was fine-tuned to produce accurate, single-word sentiment predictions. This significantly improved performance and makes the model well-suited for real-world applications like social media monitoring and customer feedback analysis.
On the MTEB tweet sentiment test set, the fine-tuned model achieved a notable accuracy gain:
Accuracy on MTEB Tweet Sentiment Classification
| Model | Accuracy (%) |
|---|---|
| Llama 3.1–8B (zero-shot) | 63.41 |
| Llama 3.1–8B (fine-tuned) | 81.49 |
More details can be found in the project repository on GitHub.
This project fine-tunes Stable Diffusion v2 (by Stability AI) using the Hugging Face Diffusers library to generate images in a customized visual style—in this case, the Naruto anime aesthetic.
The fine-tuning was performed using LoRA (Low-Rank Adaptation), which adapts pretrained diffusion models by inserting trainable low-rank matrices into existing weights, significantly reducing memory and compute requirements. The process was accelerated using 8× H100 GPUs across 2 HPC nodes, achieving a 77% reduction in training time compared to single-GPU training.
We used the lambdalabs/naruto-blip-captions dataset to teach the model the distinct visual characteristics of Naruto anime.
Prompt:
A detailed portrait of Hello Kitty, rendered in the style of Naruto anime, with a blue background.
| Model | Output Example |
|---|---|
| Base Stable Diffusion | ![]() |
| LoRA Fine-Tuned Model | ![]() |
The base model fails to capture Naruto’s stylistic elements, while the fine-tuned model successfully generates images in the correct anime style.
Note: This project is currently under active development to further improve accuracy and generalization.
I implemented Graph Attention Networks to predict bike traffic volume using social and environmental data. The models were trained separately on datasets from Dresden, Leipzig, and Hamburg. The following plot illustrates the results across the three cities:
This project won second place in the data science challenge at BTW 2023. More details are available on GitHub.
I developed a speaker identification system using Transformer and Conformer encoders, improving accuracy from 53.94% to 91.8% on a validation dataset of 56,666 voice recordings. More details are available on GitHub.
Using a dataset of approximately 71,000 anime face images, I trained a diffusion probabilistic model to generate anime-style portraits. The generative network improved significantly over training iterations, as shown in the images below:
After 1,000 iterations (left) vs. 20,000 iterations (right):
More details are available on GitHub.