Prompt Engineering Guide
πŸ˜ƒ Basics
πŸ’Ό Applications
πŸ§™β€β™‚οΈ Intermediate
🧠 Advanced
Special Topics
🌱 New Techniques
πŸ€– Agents
βš–οΈ Reliability
πŸ–ΌοΈ Image Prompting
πŸ”“ Prompt Hacking
πŸ” Language Model Inversion
πŸ”¨ Tooling
πŸ’ͺ Prompt Tuning
πŸ—‚οΈ RAG
πŸ”§ Models
🎲 Miscellaneous
πŸ“™ Vocabulary Resource
πŸ“š Bibliography
πŸ“¦ Prompted Products
πŸ›Έ Additional Resources
πŸ”₯ Hot Topics
✨ Credits
πŸ—‚οΈ RAG🟦 Self-RAG

Self-RAG

🟦 This article is rated medium
Reading Time: 2 minutes
Last updated on March 2, 2025

Valeriia Kuka

Self-RAG (Self-Reflective Retrieval-Augmented Generation) is a new approach that improves Large Language Models (LLMs) by making them smarter about when to retrieve external knowledge and how to self-assess their responses.

Unlike traditional Retrieval-Augmented Generation (RAG) systems, which blindly fetch a fixed number of documents, Self-RAG introduces self-reflection to:

  1. Decide if retrieval is needed before fetching knowledge.
  2. Generate responses while evaluating the relevance of retrieved passages.
  3. Critique its own output to improve factuality and overall quality.

This means that the model doesn't just "guess" an answer, it learns to verify and refine its own responses, leading to more accurate, relevant, and reliable outputs.

How Self-RAG Differs from Existing Techniques

FeatureTraditional RAGSelf-RAG
RetrievalAlways retrieves a fixed number of documentsRetrieves only when necessary
Relevance checkingUses retrieved docs without verificationAssesses whether retrieved docs are relevant
Self-critiqueNo self-evaluationEvaluates its own responses for correctness and factuality
CustomizationNo control over output qualityAllows tuning for precision vs. completeness

Unlike previous methods that either:

  • Always retrieve information (even when unnecessary), or
  • Generate outputs without verification

Self-RAG retrieves only when needed and critically assesses its own outputs.

How Self-RAG Works

Step 1: Retrieve on Demand

  • The model first decides if retrieval is necessary using a special Retrieve token.
  • If retrieval is needed, it fetches the top K most relevant documents.

Step 2: Generate and Evaluate in Parallel

  • It processes multiple retrieved passages simultaneously.
  • It evaluates each passage's relevance with an ISREL (Is Relevant) token.

Step 3: Critique and Select the Best Response

  • The model critiques its own output using:
    • ISSUP (Is Supported) token: Checks if the response is backed by evidence.
    • ISUSE (Is Useful) token: Measures the overall quality of the response.
  • The final response is selected based on these critique scores.

Conclusion

Self-RAG is a breakthrough approach for making AI more factually accurate, reliable, and controllable. By retrieving only when needed and self-reflecting on its outputs, it outperforms existing RAG techniques and even beats proprietary models like ChatGPT in factual accuracy tasks.

Valeriia Kuka

Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.

Footnotes

  1. Asai, A., Wu, Z., Wang, Y., Sil, A., & Hajishirzi, H. (2023). Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. https://arxiv.org/abs/2310.11511 ↩