R^2AG
(Retrieval-to-Retrieval Augmented Generation) is a new framework that improves Retrieval-Augmented Generation (RAG) by reducing the semantic gap between retrievers and large language models (LLMs).
Traditional RAG models retrieve external documents and pass them to an LLM for response generation. However, retrievers and LLMs are trained differently:
- Retrievers focus on finding the most relevant documents.
- LLMs focus on understanding and generating language based on retrieved content.
This difference creates a semantic gap, where the LLM might misinterpret retrieved documents, leading to hallucinations or low-quality responses.
solves this by incorporating retrieval information directly into LLM generation using a trainable R2-Former model and a retrieval-aware prompting strategy.
How Works
Step 1: Retrieval Feature Extraction
The retriever fetches relevant documents based on a query.
extracts additional retrieval features:
- Relevance score (): How relevant is the document to the query?
- Precedent similarity (): How similar is the document to previously retrieved ones?
- Neighbor similarity (): How similar is it to nearby documents in the ranking list?
Step 2: Processing with R2-Former
- The R2-Former model (a lightweight Transformer) processes retrieval features to understand why each document was retrieved.
- It captures semantic relationships between the query and documents.
- It outputs retrieval-aware embeddings.
Step 3: Retrieval-Aware Prompting
Instead of just prepending retrieved documents, integrates retrieval information into the input embeddings of the LLM. Each document is paired with its retrieval-aware embedding (serving as an "anchor" to guide LLM focus).
Step 4: LLM Generation
The LLM generates a response using both the raw text and retrieval-aware embeddings. This reduces confusion and prevents hallucinations by helping the LLM focus on the most relevant documents.
Step 5: Joint Training (Optional)
- If computational resources allow, can fine-tune both the R2-Former and LLM together for even better performance.
- Otherwise, it works with frozen LLMs, making it a cost-effective upgrade to existing RAG systems.
code is available on GitHub.
Results of
- outperforms other RAG methods across multiple tasks.
- Significant improvements in complex reasoning tasks like multi-hop QA.
- Works even when the retriever isn't perfect, helping LLMs focus on useful documents.
- Fine-tuned beats all competitors, while frozen still achieves strong results.
Conclusion
is a major advancement in Retrieval-Augmented Generation, offering a smarter, more reliable way to integrate retrieval with generation.
Valeriia Kuka
Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.
Footnotes
-
Ye, F., Li, S., Zhang, Y., & Chen, L. (2024). R^2AG: Incorporating Retrieval Information into Retrieval Augmented Generation. https://arxiv.org/abs/2406.13249 ↩