Agentic_RAG

Agentic Retrieval-Augmented-Generation (RAG): AI Agent for self-query and query reformulation

Project Objective

To implement and evaluate Agentic Retrieval Augmented Generation (RAG), comparing its performance against traditional RAG and standalone Large Language Models (LLMs) in answering technical questions related to Hugging Face ecosystem packages.

Motivation

The primary motivation behind this project is to explore and demonstrate the advanced capabilities of RAG by incorporating intelligent agents.
Traditional RAG systems, while powerful, often follow a fixed retrieve-then-generate pattern. This project aims to move beyond that by showcasing how an agent-based approach can introduce more dynamic decision-making, iterative refinement, and tool utilization into the RAG pipeline. By implementing an agentic framework, we seek to address the limitations of basic RAG, such as handling complex, multi-step queries or adapting to diverse information retrieval needs.
The implementation focuses on creating an agent that can intelligently interact with external knowledge sources, evaluate retrieved content, and refine its approach based on the outcome, ultimately leading to more accurate, robust, and contextually rich responses from the Large Language Model.

Instructions

To demonstrate the results and run the Agentic RAG system, follow these steps:

  1. Open the Notebook: Launch the agentic_RAG.ipynb notebook in a Jupyter environment (e.g., Jupyter Lab, Jupyter Notebook, VS Code with Jupyter extension, or Google Colab).
  2. Install Dependencies: Ensure all necessary Python packages are installed. You can usually install them by running pip install -r requirements.txt.
  3. Set Up Environment Variables: Make sure your Gemini_API_KEY is set as an environment variable in a .env file or directly in your notebook.
  4. Run All Cells: Execute all cells in the agentic_RAG.ipynb notebook sequentially. This will:
    • Load and preprocess the dataset.
    • Create or load the vector database.
    • Initialize and run the Agentic RAG, Standard RAG, and standalone LLM evaluations.
    • Save the evaluation results to a JSON file in the results directory.
    • Print the average accuracy scores for each system.

The notebook is designed to handle checkpointing for the agentic RAG evaluation, allowing you to resume if interruptions occur.

Key Capabilities of This Agentic RAG Implementation

Agentic RAG significantly enhances the RAG pipeline, providing more sophisticated reasoning, planning, and execution capabilities for robustly handling complex information-seeking tasks. This repository leverages the smolagent package to build the underlying agentic framework.

Parallel Vector Database Creation with Optimized Document Processing

This implementation employs an optimized parallel processing approach for creating vector databases from extensive document collections. For this project, it specifically utilizes the database, which contains information on packages developed by Hugging Face. This technique integrates several performance optimization strategies:

Key Features

Technical Implementation

This approach is particularly effective for large-scale RAG applications where document preprocessing time is a bottleneck, providing significant speedup while maintaining retrieval quality.

Results

Performance was evaluated using the Hugging Face technical Q&A dataset with Gemini 1.5, Gemini 2.0, and Gemini 2.5 LLMs. Agentic RAG consistently demonstrated superior accuracy compared to Standard RAG across all models. The relative performance trends among the different LLMs remained consistent between Agentic RAG and Standard RAG implementations. As expected, standalone LLM models generally exhibited lower accuracy, with Gemini 1.5 showing the most significant performance deficit. Gemini 2.0 was specifically utilized to evaluate the consistency of generated answers against the ground truth in the Hugging Face technical Q&A dataset.

Model Agentic RAG Standard RAG LLM Only
Gemini-1.5-flash 91.5% 85.4% 35.4%
Gemini-2.0-flash 90.8% 85.4% 64.1%
Gemini-2.5-flash 90.8% 86.2% 63.8%

All values above are accuracy scores (in %)

Improvement

Future improvements for this project include:

  1. Broaden LLM Evaluation: Expand testing to include a wider variety of LLMs or different agentic RAG architectural patterns (e.g., integrating various tools, multi-agent systems) to assess generalizability and identify optimal configurations.
  2. Refine Agent Prompting: Enhance the system prompts to more precisely guide agent behavior, leading to increased efficiency and better alignment with desired task execution.
  3. Enhance Objective Evaluation Criteria: Develop more rigorous system prompts for evaluating LLM responses, ensuring objectivity, especially concerning conciseness and directness. Responses that are overly verbose or contain extraneous information, even if partially correct, should be scored down appropriately.

Reference:

This repository is extended from the work of the Hugging Face Agentic RAG Cookbook.