Chain-of-Thought Prompting
- Chain-of-Thought (CoT) Prompting: This technique improves LLM performance by encouraging them to articulate their reasoning process, leading to more accurate answers.
- Task Effectiveness: CoT is particularly beneficial for complex tasks and works best with larger models; smaller models may perform worse.
What is Chain-of-Thought Prompting?
Chain-of-Thought (CoT) Prompting is a technique that enhances the reasoning capabilities of large language models (LLMs) by incorporating logical steps—or a “chain of thought”—within the prompt. Unlike direct-answer prompting, CoT guides the model to work through intermediate reasoning steps, making it more adept at solving complex tasks like math problems, commonsense reasoning, and symbolic manipulation.
How Chain-of-Thought Prompting Differs from Existing Techniques
Traditional prompts typically consist of simple input-output examples and lack explicit reasoning steps, making it challenging for models to infer the necessary logic for tasks requiring multi-step reasoning. CoT prompting addresses this by:
- Encouraging Multi-Step Reasoning: Rather than relying solely on model size for complex tasks, CoT embeds reasoning steps within the prompt, unlocking sophisticated reasoning in models that might otherwise struggle with complexity.
- Achieving Efficiency without Finetuning: CoT works across tasks without the need for finetuning, using a standard prompt format that embeds reasoning, thus simplifying adaptation to various complex tasks.
The example below illustrates the difference between few-shot prompting (left) and CoT prompting (right). While the traditional approach goes directly to the solution, CoT guides the model to lay out its reasoning process, often resulting in more accurate and interpretable outcomes.
The key concept of CoT is that by providing a few examples (or exemplars), where the reasoning process is explicitly shown, the LLM learns to include reasoning steps in its responses. This structured approach to thinking often results in more accurate outputs.
How Chain-of-Thought Prompting Works
- Decompose the Problem: CoT prompts guide the model to break down a complex question into manageable steps, akin to how a human might solve the problem.
- Guide with Exemplars: CoT uses examples that demonstrate reasoning steps, helping the model grasp the method needed to reach the correct answer.
With CoT, the model essentially “talks through” its thought process, leading to more reliable answers.
Applications and Benefits:
CoT prompting is especially valuable for tasks where structured reasoning is crucial:
- Mathematics and Arithmetic: CoT helps solve multi-step word problems by guiding calculations through each necessary step.
- Commonsense and Symbolic Reasoning: Useful for tasks requiring general knowledge or symbolic reasoning, where CoT can bridge the gap between facts and logical connections.
- Complex Decision-Making: In fields like robotics, CoT enables models to follow logical steps for decision-making tasks.
How to Use Chain-of-Thought Prompting
Chain-of-Thought Prompting Template
Q: John has 10 apples. He gives away 4 and then receives 5 more. How many apples does he have?
A:
- John starts with 10 apples.
- He gives away 4, so 10 - 4 = 6.
- He then receives 5 more apples, so 6 + 5 = 11. Final Answer: 11
Q: [Your Question]
Examples
Here are two demos illustrating how CoT prompting improves outcomes. The first demo shows GPT-3 (davinci-003) struggling with a word problem without CoT, while the second shows it succeeding using CoT.
Incorrect Solution (Without CoT)
Correct Solution (Using CoT)
Chain-of-Thought Results
Research has shown that CoT prompting can significantly enhance LLM accuracy on tasks like arithmetic, commonsense, and symbolic reasoning. For instance, a prompted PaLM 540B model achieved a 57% solve rate accuracy on GSM8K, setting a state-of-the-art (SOTA) benchmark at the time.
The table below summarizes the performance improvements on key benchmarks when using CoT prompting:
Task | Model | Standard Prompting Accuracy | CoT Prompting Accuracy | Improvement |
---|---|---|---|---|
GSM8K (Math) | PaLM 540B | 55% | 74% | +19% |
SVAMP (Math) | PaLM 540B | 57% | 81% | +24% |
Commonsense (CSQA) | PaLM 540B | 76% | 80% | +4% |
Symbolic Reasoning | PaLM 540B | ~60% | ~95% | +35% |
Limitations of Chain-of-Thought
Importantly, according to CoT authors, CoT only yields performance gains when used with models of ∼100B parameters. Smaller models wrote illogical chains of thought, which led to worse accuracy than standard prompting. Models usually get performance boosts from CoT prompting in a manner proportional to the size of the model.
Conclusion
Chain-of-Thought Prompting is a powerful method for unlocking reasoning capabilities in large language models. By encouraging step-by-step thinking, CoT prompting allows models to perform complex reasoning tasks effectively without needing additional training data. The benefits are particularly pronounced in large models (e.g., models with over 100 billion parameters), which exhibit improved reasoning capacities as they follow these structured reasoning prompts.
FAQ
Why is Chain-of-Thought prompting effective?
Chain-of-Thought prompting works by providing the model with examples of logical reasoning. When shown how to approach problems in a step-by-step way, the LLM is more likely to emulate this approach, resulting in responses that are both accurate and reliable.
What is a limitation of Chain-of-Thought prompting?
CoT prompting is less effective with smaller models. To achieve meaningful gains, it’s best to apply CoT in proportion to the model’s size, as smaller models may produce less coherent reasoning with CoT prompting.
Valeriia Kuka
Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.
Footnotes
-
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. ↩ ↩2 ↩3 ↩4
-
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., … Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways. ↩
-
Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., & Schulman, J. (2021). Training Verifiers to Solve Math Word Problems. ↩