The Prompt Report: Insights from The Most Comprehensive Study of Prompting Ever Done
4 minutes
I recently led a team of 32 researchers from top institutions like OpenAI, Google, and Stanford to systematically analyze 1,500+ academic papers on prompting. As a result, we created The Prompt Report, an 80+ page survey that’s the most comprehensive exploration of prompting techniques ever published.
But we didn’t stop there.
At Learn Prompting, we expanded on the report’s findings to create the most comprehensive Prompt Engineering Guide with prompting tips for every level. Specifically, we created documentation for all 58 prompting techniques detailed in The Prompt Report.
Here’s what we discovered about how to write better prompts, why it matters, and how you can apply these findings in your own work.
A Taxonomy of Prompting Techniques: 6 Categories, 58 Methods
Our research introduced a structured taxonomy of 58 text-based prompting techniques, grouped into 6 problem-solving categories.
- Few-Shot Prompting: Improve results by showing examples.
- Thought Generation: Encourage reasoning with methods like Chain-of-Thought (CoT) and Thread-of-Thought (ThoT).
- Zero-Shot Prompting: Skip examples; rely on precise instructions.
- Ensembling: Combine outputs from multiple prompts to enhance reliability.
- Self-Criticism: Ask the AI to critique and refine its own responses.
- Decomposition: Simplify complex tasks into smaller, more manageable problems.
Each technique comes with a detailed guide on Learn Prompting, explaining:
- What it does and when to use it.
- How it differs from similar techniques.
- Practical tips, templates, and real-world examples.
Start with Few-Shot and Chain-of-Thought techniques for impactful improvements in reasoning and task performance.
What Makes Exemples So Powerful in Prompt Design? In-Context Learning (ICL)
In-Context Learning (ICL), first identified in 2020, is one of AI’s most powerful yet enigmatic capabilities. It allows large language models (LLMs) to perform new tasks based solely on the examples and instructions provided in a prompt without retraining. This approach often relies on the few-shot prompting technique.
The Art of Exemplars
Crafting effective prompts depends on exemplars - examples provided within the prompt. Six key factors influence their effectiveness:
- Quantity: More is usually better, up to a limit.
- Order: Avoid recency bias by alternating positive and negative examples.
- Label Distribution: Balance the types of examples to ensure accurate predictions.
- Quality: Use clear, relevant, and accurate examples.
- Format: Consistent formatting improves comprehension.
- Similarity: Align examples closely with the task to boost accuracy.
Even small tweaks in these variables can improve accuracy by up to 90%, demonstrating the power of thoughtful prompt design.
Benchmarking: Which Prompting Techniques Perform Best?
To evaluate prompting techniques, we benchmarked six top-performing methods against ChatGPT using the MMLU dataset, a comprehensive collection of diverse questions.
Key Findings:
- Few-Shot Chain-of-Thought (CoT) consistently delivered superior results, showcasing its versatility for reasoning and problem-solving tasks.
- Surprisingly, Self-Consistency, despite its popularity, showed limited effectiveness in comparison.
These benchmarks provide clear guidance for choosing the right techniques to optimize performance.
Humans vs. AI: Who Wins in Prompt Engineering?
How do humans stack up against AI in prompt engineering? To find out, we compared my manual efforts as a prompt engineer with an AI-driven tool called DSPy (Dee-Ess-Pie).
Results:
- The human prompt engineer (me, Sander Schulhoff) spent 20 hours refining a prompt for a binary classification task.
- "AI Prompt Engineer" generated a prompt in just 10 minutes and significantly outperformed my manual version.
- After slight adjustments, the AI-generated prompt achieved an F1 score of nearly 0.6.
DSPy is a Python library for automatically optimizing prompts. By generating and refining examples and explanations, DSPy optimizes prompts with impressive results. It is built inspired by the “Prompting as Programming” paradigm and has a Pytorch-like API.
The Future of Prompting
Prompting is the future of human-AI interaction, but it’s not without challenges.
The Prompt Report addresses common issues and proposes solutions, including:
- Prompt drift: (performance changes over time)
- Prompt injection: LLMs' exploitation through malicious inputs
- Multilingual prompting: Adapting techniques for different languages.
- Multimodal prompting: Combining text, images, and other data types.
- Agentic prompting: Empowering AI systems to take actions autonomously.
From this paper, we can see that prompting is not going away any time soon. The number of prompting techniques continues to grow to address the various difficulties associated with prompting.
If you are interested in learning more about prompting, we recommend reading The Prompt Report and our free Prompt Engineering Guide.
Here’s how to cite The Prompt Report:
@article{schulhoff2024prompt,
title={The Prompt Report: A Systematic Survey of Prompting Techniques},
author={Schulhoff, Sander and Ilie, Michael and Balepur, Nishant and Kahadze, Konstantine and Liu, Amanda and Si, Chenglei and Li, Yinheng and Gupta, Aayush and Han, HyoJung and Schulhoff, Sevien and others},
journal={arXiv preprint arXiv:2406.06608},
year={2024}
}
Sander Schulhoff
Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.
Footnotes
-
Brown, T. B. (2020). Language models are few-shot learners. arXiv Preprint arXiv:2005.14165. ↩
-
Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Moazam, H., Miller, H., Zaharia, M., & Potts, C. (2023). DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. arXiv Preprint arXiv:2310.03714. ↩