Agentic_RAG

Agentic Retrieval-Augmented-Generation (RAG): AI Agent for self-query and query reformulation

Project Description

This project implements and evaluates Agentic Retrieval-Augmented Generation (RAG), comparing its performance against traditional RAG and standalone Large Language Models (LLMs) when answering technical questions about the Hugging Face ecosystem.

While traditional RAG systems are powerful, they follow a fixed retrieve-then-generate pattern. This project goes further by introducing an agent-based approach that enables dynamic decision-making, iterative query refinement, and adaptive tool use — addressing core limitations of basic RAG when dealing with complex or multi-step queries. The agent intelligently interacts with external knowledge sources, evaluates retrieved content, and refines its strategy based on outcomes, resulting in more accurate, robust, and contextually rich answers. This repository leverages the smolagents package to build the underlying agentic framework.

RAG_vs_Agentic_RAG

Key Capabilities

Capability Description
Query Strategy & Refinement Strategically determines and combines keywords for search queries, iteratively refining them based on retrieval results to optimize relevance and coverage.
Iterative Query Refinement If initial retrieval is insufficient, the agent reformulates queries or expands the number of retrieved documents.
Document Evaluation Assesses the relevance and quality of retrieved information relative to the question before generating an answer.
Multi-step Reasoning Chains together multiple retrieval and generation steps to answer complex questions.
Self-Correction & Backtracking If a generated answer is unsatisfactory, the agent devises and executes alternative retrieval strategies.

Installation

Prerequisites

Steps

  1. Clone the repository
    git clone https://github.com/Wen-ChuangChou/Agentic_RAG.git
    cd Agentic_RAG
    
  2. Create and activate a virtual environment (recommended)
    python -m venv venv
    # Windows
    venv\Scripts\activate
    # macOS / Linux
    source venv/bin/activate
    
  3. Install dependencies
    pip install -r requirement.txt
    
  4. Configure your API key

    Create a .env file in the project root and add:

    GEMINI_API_KEY=your_api_key_here
    Blablador_API_KEY=your_api_key_here
    

Usage

There are three main scripts to run the evaluation pipeline and visualize the results.


1. Run the Evaluation — agentic_rag.py

This is the core script. It evaluates and compares three QA systems (Agentic RAG, Standard RAG, and Vanilla LLM) on the Hugging Face technical Q&A dataset, using an LLM-as-judge for scoring. Results are saved as JSON files in the results/ directory, and checkpoints are written to checkpoints/ so long runs can be safely resumed.

python agentic_rag.py

What it does:

The model name and chunk size are configurable inside main() via the config dictionary and the model_name variable.

Vector database pipeline (utils/vectordb_utils.py):

Feature Detail
Parallel splitting Documents split concurrently via ThreadPoolExecutor for large-scale speed-up
Batch embedding Embeds in configurable batches (default 100), merging FAISS shards incrementally to manage memory
Thread-safe processing DocumentProcessor uses threading locks to prevent race conditions
Intelligent fallback Automatically falls back to sequential processing if parallel execution fails
Deduplication Removes duplicate documents by content hash before indexing
Persistent caching Saves/loads the FAISS index from vectordb/ to skip expensive recomputation on repeat runs

[!TIP] Use a GPU to create the vector store. Embedding generation is heavily compute-bound: on this dataset a GPU completes the full build in ~14 seconds (H100), while a CPU takes more than 23 minutes (AMD 5600x). If a GPU is available, ensure your torch installation is CUDA-enabled — the pipeline will use it automatically.


2. Visualize Performance Scores — visualize_rag_performance.py

Generates a grouped bar chart comparing the mean accuracy (%) of Agentic RAG, Standard RAG, and Vanilla LLM across all JSON result files found in the results directory. The plot is saved as evaluation_scores.png.

python visualize_rag_performance.py
# or specify a custom results directory:
python visualize_rag_performance.py --results_dir path/to/results

Output: results/evaluation_scores.png


3. Visualize Score Distribution — visualize_correct_portion.py

Generates a stacked bar chart showing the proportion of Correct, Partially correct, and Wrong answers for each system type and model. This gives a richer view of answer quality beyond simple accuracy. The plot is saved as score_distribution.png.

python visualize_correct_portion.py
# or specify a custom results directory:
python visualize_correct_portion.py --results_dir path/to/results

Output: results/score_distribution.png


Project Structure

RAG_paper/
│
├── agentic_rag.py                  # Main evaluation script (Agentic RAG, Standard RAG, Vanilla LLM)
├── visualize_rag_performance.py    # Grouped bar chart of mean accuracy scores
├── visualize_correct_portion.py    # Stacked bar chart of score distribution
├── requirement.txt                 # Python dependencies
├── .env                            # API keys (not tracked by git)
│
├── utils/                          # Helper modules
│   ├── agent_tools.py              # RetrieverTool for the smolagents CodeAgent
│   ├── blablador_helper.py         # Blablador LLM API wrapper
│   ├── checkpoint_runner.py        # Checkpointing logic for long evaluations
│   ├── results_manager.py          # Save / load evaluation results to JSON
│   └── vectordb_utils.py           # FAISS vector database creation & caching (gte-small embeddings, cosine distance, parallel splitting, persistent cache)
│
└─ prompts/                        # YAML prompt templates
    ├── gemini_agent_system_prompt.yaml
    ├── guide_agent_system_prompt.yaml
    └── evaluation_prompt.yaml

Results

Performance was evaluated using the Hugging Face technical Q&A dataset. The results demonstrate that the Agentic RAG approach consistently outperforms standard RAG and standalone LLMs.

Performance Comparison

Evaluation Scores

Agentic RAG consistently delivers the highest accuracy across all evaluated models, outperforming Standard RAG by a clear margin and significantly surpassing Vanilla LLM. This indicates that the pipeline design (agentic workflow) has a stronger impact on performance than the choice of base model, and that single-pass retrieval (Standard RAG) is not sufficient for high-quality technical QA.

Score Distribution and Model Strength

Score Distribution

The primary advantage of Agentic RAG comes from drastically reducing incorrect answers, not just increasing partially correct ones. Compared to Standard RAG and Vanilla LLM, it shifts outcomes from “wrong” directly to “correct,” demonstrating that iterative retrieval and reasoning improve answer reliability and suppress hallucinations, rather than merely producing safer or more ambiguous responses.

Extended Project

Private LLM Serving with vLLM: We have successfully implemented a transition from external APIs (such as Gemini or Blablador) to local, HPC-hosted models using vLLM. This dedicated extension significantly improves inference speeds and ensures strict data privacy by keeping sensitive information within a secure, private GPU computing cluster—a critical requirement for production-grade applications.

Future Improvements

To further enhance the performance, efficiency, and security of this Agentic RAG system, the following areas are identified for future development:

  1. Comprehensive Agent Telemetry : System prompts are a critical factor in agentic performance. Implementing a robust telemetry system would allow for granular monitoring of agent behavior, enabling systematic comparison of different system prompts and reasoning patterns to identify the most effective configurations.

  2. Self-Refining Agent Prompting via Reinforcement Learning : Leveraging Reinforcement Learning (RL) to allow an agent to iteratively refine its own system prompts. The goal is to optimize for factual accuracy while ensuring the agent maintains its existing capabilities. This approach can lead to more efficient retrieval strategies, reducing the number of necessary steps and improving alignment with complex task requirements.

Reference:

  1. Hugging Face Agentic RAG Cookbook.
  2. Blablador.