Prompt Engineering Guide
😃 Basics
💼 Applications
🧙‍♂️ Intermediate
🧠 Advanced
Special Topics
🌱 New Techniques
🤖 Agents
⚖️ Reliability
🖼️ Image Prompting
🔓 Prompt Hacking
🔁 Language Model Inversion
🔨 Tooling
💪 Prompt Tuning
🗂️ RAG
🔧 Models
🎲 Miscellaneous
📙 Vocabulary Resource
📚 Bibliography
📦 Prompted Products
🛸 Additional Resources
🔥 Hot Topics
✨ Credits
🗂️ RAG🟦 R^2AG

R^2AG

🟦 This article is rated medium
Reading Time: 2 minutes
Last updated on March 2, 2025

Valeriia Kuka

R2AGR^2AG (Retrieval-to-Retrieval Augmented Generation) is a new framework that improves Retrieval-Augmented Generation (RAG) by reducing the semantic gap between retrievers and large language models (LLMs).

Traditional RAG models retrieve external documents and pass them to an LLM for response generation. However, retrievers and LLMs are trained differently:

  • Retrievers focus on finding the most relevant documents.
  • LLMs focus on understanding and generating language based on retrieved content.

This difference creates a semantic gap, where the LLM might misinterpret retrieved documents, leading to hallucinations or low-quality responses.

R2AGR^2AG solves this by incorporating retrieval information directly into LLM generation using a trainable R2-Former model and a retrieval-aware prompting strategy.

How R2AGR^2AG Works

Step 1: Retrieval Feature Extraction

The retriever fetches relevant documents based on a query.

R2AGR^2AG extracts additional retrieval features:

  • Relevance score (rr): How relevant is the document to the query?
  • Precedent similarity (γ\gamma): How similar is the document to previously retrieved ones?
  • Neighbor similarity (ζ\zeta): How similar is it to nearby documents in the ranking list?

Step 2: Processing with R2-Former

  • The R2-Former model (a lightweight Transformer) processes retrieval features to understand why each document was retrieved.
  • It captures semantic relationships between the query and documents.
  • It outputs retrieval-aware embeddings.

Step 3: Retrieval-Aware Prompting

Instead of just prepending retrieved documents, R2AGR^2AG integrates retrieval information into the input embeddings of the LLM. Each document is paired with its retrieval-aware embedding (serving as an "anchor" to guide LLM focus).

Step 4: LLM Generation

The LLM generates a response using both the raw text and retrieval-aware embeddings. This reduces confusion and prevents hallucinations by helping the LLM focus on the most relevant documents.

Step 5: Joint Training (Optional)

  • If computational resources allow, R2AGR^2AG can fine-tune both the R2-Former and LLM together for even better performance.
  • Otherwise, it works with frozen LLMs, making it a cost-effective upgrade to existing RAG systems.
Note

R2AGR^2AG code is available on GitHub.

Results of R2AGR^2AG

  • R2AGR^2AG outperforms other RAG methods across multiple tasks.
  • Significant improvements in complex reasoning tasks like multi-hop QA.
  • Works even when the retriever isn't perfect, helping LLMs focus on useful documents.
  • Fine-tuned R2AGR^2AG beats all competitors, while frozen R2AGR^2AG still achieves strong results.

Conclusion

R2AGR^2AG is a major advancement in Retrieval-Augmented Generation, offering a smarter, more reliable way to integrate retrieval with generation.

Valeriia Kuka

Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.

Footnotes

  1. Ye, F., Li, S., Zhang, Y., & Chen, L. (2024). R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation. https://arxiv.org/abs/2406.13249