Learning from Negative Examples: Contrastive Chain-of-Thought (CoT) enhances LLM reasoning by incorporating both positive and negative examples
Prompt Structure: Contrastive CoT prompts include a question, correct/incorrect answers, and the query.
Generating Examples: Automatically create contrastive examples by shuffling key entities from correct answers.
Technique | Institution | Date of Publication | Paper | Code |
---|---|---|---|---|
Contrastive Chain-of-Thought (CoT) Prompting | DAMO Academy, Alibaba Group; Singapore University of Technology and Design; Nanyang Technological University, Singapore | Nov 2023 | Contrastive Chain-of-Thought Prompting | DAMO-NLP-SG/contrastive-cot |
Chain-of-Thought (CoT) Prompting improves the accuracy of Large Language Models (LLMs) when solving mathematical and reasoning tasks by compelling the models to think. Adding a simple statement like "Let's think step by step" to the prompt can significantly improve the model's performance. Adding a positive example, a question-answer pair, before the actual question further improves the reasoning of the LLM. Learning from negative examples is not unique to humans and can also be replicated in LLMs. Embedding negative examples with the prompt can help LLMs learn from them and generate better explanations for their answers.
Inspired by human's ability to learn from positive and negative examples, Contrastive Chain-of-Thought (CoT) Prompting, employs negative examples along with positive ones to enhance the reasoning capabilities of language models. Making the models aware of what mistakes to avoid via negative examples also makes them more accurate.
The prompt in the Contrastive CoT compromises of the following components:
The example below shows Contrastive Chain-of-Thought prompting in action.
Question : James writes a 3-page letter to 2 different friends twice a week. How many pages does he write a year?
Explanation: He writes each friend 32=6 pages a week. So he writes 62=12 pages every week. That means he writes 12*52=624 pages a year.
Wrong Explanation: He writes each friend 1252=624 pages a week. So he writes 32=6 pages every week. That means he writes 6*2=12 pages a year.
Question: James has 30 teeth. His dentist drills 4 of them and caps 7 more teeth than he drills. What percentage of James' teeth does the dentist fix?
Explanation: The dentist drills 4 teeth, so there are 30 - 4 = 26 teeth left. The dentist caps 7 more teeth than he drills, so he caps 4 + 7 = 11 teeth. Therefore, the dentist fixes a total of 4 + 11 = 15 teeth. To find the percentage of teeth the dentist fixes, we divide the number of teeth fixed by the total number of teeth and multiply by 100: 15/30 Γ 100 = 50%
The authors devise an automated method to generate contrastive examples in the paper. It involves the following steps:
Contrastive example generation
The image below shows an instance where a contrastive example is generated using a positive example.
From positive example to contrastive example
The example below shows Contrastive CoT in action. Feel free to modify it and test your inputs.
The code for Contrastive Chain-of-Thought (CoT) Prompting is open-sourced and available for further research and implementation at DAMO-NLP-SG/contrastive-cot.
Learning from negative examples is not unique to humans and can also be replicated in language models. Embedding negative examples with the prompt can help language models learn from them and generate better explanations for their answers.
Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.