The Prompt Report: Insights from The Most Comprehensive Study of Prompting Ever Done

December 12th, 2024

4 minutes

🟢easy Reading Level

I recently led a team of 32 researchers from top institutions like OpenAI, Google, and Stanford to systematically analyze 1,500+ academic papers on prompting. As a result, we created The Prompt Report, an 80+ page survey that’s the most comprehensive exploration of prompting techniques ever published.

But we didn’t stop there.

At Learn Prompting, we expanded on the report’s findings to create the most comprehensive Prompt Engineering Guide with prompting tips for every level. Specifically, we created documentation for all 58 prompting techniques detailed in The Prompt Report.

Here’s what we discovered about how to write better prompts, why it matters, and how you can apply these findings in your own work.

A Taxonomy of Prompting Techniques: 6 Categories, 58 Methods

Our research introduced a structured taxonomy of 58 text-based prompting techniques, grouped into 6 problem-solving categories.

Few-Shot Prompting: Improve results by showing examples.
Thought Generation: Encourage reasoning with methods like Chain-of-Thought (CoT) and Thread-of-Thought (ThoT).
Zero-Shot Prompting: Skip examples; rely on precise instructions.
Ensembling: Combine outputs from multiple prompts to enhance reliability.
Self-Criticism: Ask the AI to critique and refine its own responses.
Decomposition: Simplify complex tasks into smaller, more manageable problems.

Each technique comes with a detailed guide on Learn Prompting, explaining:

What it does and when to use it.
How it differs from similar techniques.
Practical tips, templates, and real-world examples.

Tip

Start with Few-Shot and Chain-of-Thought techniques for impactful improvements in reasoning and task performance.

What Makes Exemples So Powerful in Prompt Design? In-Context Learning (ICL)

In-Context Learning (ICL), first identified in 2020, is one of AI’s most powerful yet enigmatic capabilities. It allows large language models (LLMs) to perform new tasks based solely on the examples and instructions provided in a prompt without retraining. This approach often relies on the few-shot prompting technique.

The Art of Exemplars

Crafting effective prompts depends on exemplars - examples provided within the prompt. Six key factors influence their effectiveness:

Quantity: More is usually better, up to a limit.
Order: Avoid recency bias by alternating positive and negative examples.
Label Distribution: Balance the types of examples to ensure accurate predictions.
Quality: Use clear, relevant, and accurate examples.
Format: Consistent formatting improves comprehension.
Similarity: Align examples closely with the task to boost accuracy.

Even small tweaks in these variables can improve accuracy by up to 90%, demonstrating the power of thoughtful prompt design.

Benchmarking: Which Prompting Techniques Perform Best?

To evaluate prompting techniques, we benchmarked six top-performing methods against ChatGPT using the MMLU dataset, a comprehensive collection of diverse questions.

Key Findings:

Few-Shot Chain-of-Thought (CoT) consistently delivered superior results, showcasing its versatility for reasoning and problem-solving tasks.
Surprisingly, Self-Consistency, despite its popularity, showed limited effectiveness in comparison.

These benchmarks provide clear guidance for choosing the right techniques to optimize performance.

Humans vs. AI: Who Wins in Prompt Engineering?

How do humans stack up against AI in prompt engineering? To find out, we compared my manual efforts as a prompt engineer with an AI-driven tool called DSPy (Dee-Ess-Pie).

Results:

The human prompt engineer (me, Sander Schulhoff) spent 20 hours refining a prompt for a binary classification task.
"AI Prompt Engineer" generated a prompt in just 10 minutes and significantly outperformed my manual version.
After slight adjustments, the AI-generated prompt achieved an F1 score of nearly 0.6.

DSPy is a Python library for automatically optimizing prompts. By generating and refining examples and explanations, DSPy optimizes prompts with impressive results. It is built inspired by the “Prompting as Programming” paradigm and has a Pytorch-like API.

The Future of Prompting

Prompting is the future of human-AI interaction, but it’s not without challenges.

The Prompt Report addresses common issues and proposes solutions, including:

Prompt drift: (performance changes over time)
Prompt injection: LLMs' exploitation through malicious inputs
Multilingual prompting: Adapting techniques for different languages.
Multimodal prompting: Combining text, images, and other data types.
Agentic prompting: Empowering AI systems to take actions autonomously.

From this paper, we can see that prompting is not going away any time soon. The number of prompting techniques continues to grow to address the various difficulties associated with prompting.

Tip

If you are interested in learning more about prompting, we recommend reading The Prompt Report and our free Prompt Engineering Guide.

Here’s how to cite The Prompt Report:

@article{schulhoff2024prompt,
    title={The Prompt Report: A Systematic Survey of Prompting Techniques},
    author={Schulhoff, Sander and Ilie, Michael and Balepur, Nishant and Kahadze, Konstantine and Liu, Amanda and Si, Chenglei and Li, Yinheng and Gupta, Aayush and Han, HyoJung and Schulhoff, Sevien and others},
    journal={arXiv preprint arXiv:2406.06608},
    year={2024}
}

Sander Schulhoff

Sander Schulhoff is the CEO of HackAPrompt and Learn Prompting. He created the first Prompt Engineering guide on the internet, two months before ChatGPT was released, which has taught 3 million people how to prompt ChatGPT. He also partnered with OpenAI to run the first AI Red Teaming competition, HackAPrompt, which was 2x larger than the White House's subsequent AI Red Teaming competition. Today, HackAPrompt partners with the Frontier AI labs to produce research that makes their models more secure. Sander's background is in Natural Language Processing and deep reinforcement learning. He recently led the team behind The Prompt Report, the most comprehensive study of prompt engineering ever done. This 76-page survey, co-authored with OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions, analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

Brown, T. B. (2020). Language models are few-shot learners. arXiv Preprint arXiv:2005.14165. ↩
Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., Haq, S., Sharma, A., Joshi, T. T., Moazam, H., Miller, H., Zaharia, M., & Potts, C. (2023). DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. arXiv Preprint arXiv:2310.03714. ↩

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

AI Red-Teaming and AI Security Masterclass

Live Courses