Welcome to the repository for fine-tuning the Llama 3 model for sentiment analysis. This project enhances Llama 3.1–8B Instruct, a state-of-the-art language model, to accurately classify sentiment in short text data.
Sentiment analysis is a fundamental task in natural language processing (NLP) that involves identifying the emotional tone of a given text. This project fine-tunes Llama 3.1–8B to classify text as positive, negative, or neutral.
The dataset used for fine-tuning and evaluation is from the Massive Text Embedding Benchmark (MTEB):
mteb/tweet_sentiment_extraction (available on Hugging Face)positive, neutral, negativeWe first evaluated the performance of Llama 3.1–8B Instruct without fine-tuning using the following prompt:
“Analyze the sentiment of the following text. Respond with exactly one word: either ‘positive’, ‘negative’, or ‘neutral’.”
Fine-tuning led to a substantial improvement in sentiment classification accuracy. In addition to overall performance, the model demonstrated marked gains in precision, recall, and F1-score across all sentiment categories.
The table below summarizes the detailed evaluation metrics on the MTEB tweet sentiment test set, comparing results before and after fine-tuning:
| Metric | Before Fine-Tuning | After Fine-Tuning |
|---|---|---|
| Accuracy | 63.41% | 81.49% |
| Negative Precision | 64.14% | 79.05% |
| Negative Recall | 76.21% | 84.42% |
| Negative F1-score | 69.66% | 81.64% |
| Neutral Precision | 63.21% | 79.42% |
| Neutral Recall | 49.22% | 76.64% |
| Neutral F1-score | 55.34% | 78.01% |
| Positive Precision | 65.24% | 86.54% |
| Positive Recall | 72.78% | 85.13% |
| Positive F1-score | 68.80% | 85.83% |
These results highlight the effectiveness of fine-tuning in enabling more nuanced and reliable sentiment predictions, especially for challenging neutral and negative cases.
The radar chart below visualizes the improvements across key performance metrics: