Fine-grained feedback: Reversing Chain-of-Thought (RCoT) detects and eliminates factual inconsistencies in LLM reasoning by using detailed feedback.
Condition checking: RCoT identifies hallucinated or overlooked conditions through problem reconstruction and comparison.
Effective for complex tasks: RCoT's performance is more impactful on challenging problems, improving accuracy significantly.
What is Reversing Chain-of-Thought (RCoT) Prompting?
Chain-of-Thought (CoT) Prompting helps to improve the accuracy of Large Language Models (LLMs) on arithmetic reasoning tasks by forcing the model to think step by step before answering the question. However, LLMs still struggle to maintain factual consistency during reasoning and seem to overlook conditions, misinterpret the original questions, and may hallucinate. This is especially true when the original question contains multiple conditions. The example below shows an instance where ChatGPT overlooks the condition that the provided date in the question is tomorrow's date.
Prompt
Today's meeting is rescheduled to 11 am tomorrow, 10/16/1924. What is the date one year ago from today?
AI Output
The current date is 10/16/1924. To find the date one year ago from today, you would subtract one year from the current year, which would be 1923. The correct answer is 10/16/1923. ❌
Reversing Chain-of-Thought (RCoT) prompting aims to enhance LLMs' reasoning abilities by detecting and rectifying condition hallucinations. RCoT employs a 3 step process to detect and rectify hallucinations:
RCoT uses the original incorrect solution to construct a new problem (say Q').
RCoT conducts a fine-grained comparison between the original problem (Q) and reconstructed problem (Q') and detects hallucinations, overlookings, and question misinterpretation.
RCoT provides the detected shortcomings to the LLM in the form of feedback and guides it to the correct output.
How to Use Reversing Chain-of-Thought?
Let's use RCoT to solve a math word problem. The dialog below shows the question and initial LLM response:
Step 1: Problem Reconstruction
Reconstruct the problem by providing the question and initial response to the LLM.
As you can see, the reconstructed problem has different facts than the ones stated in the original problem:
Step 2 (i): Problem Decomposition
Now, let's decompose both problems into a list of conditions and facts stated in the problem.
Step 2 (ii): Condition Comparison
Detect hallucinated and overlooked conditions from the condition list.
overlooked condition:
hallucinated condition
Step 2 (iii): Question comparison
Compare the questions to check if the LLM can detect the difference in facts within the questions.
Step 3: Fine-grained feedback and revision
Collect all factual inconsistencies, provide feedback to the LLM, and instruct it to revise its solution.
What Are Reversing Chain-of-Thought Results?
RCoT consistently outperforms standard CoT and the double-check (asks LLMs to check their answers but does not point out whether the answer is correct) methods in the Zero-Shot setting.
RCoT Zero-Shot results
The performance gain after employing RCoT is greater in challenging tasks than in easier tasks. For instance, ChatGPT's performance improves by 4.1% on AQuA and by 5.0% on the Date dataset. AQuA dataset contains diverse problems, and the Date dataset requires multi-hop reasoning and common sense date knowledge, making them complex compared to other datasets.
On simpler datasets like the SVAMP dataset, which contains problems that usually only need one-step calculation, the performance improves but is less apparent. ChatGPT's performance improves by 2.8% in the SVAMP dataset.
The result of employing RCoT with a Few-Shot setting is similar to the result obtained after employing RCoT using a Zero-Shot setting.
RCoT can effectively detect overlooking and misinterpretation errors but struggles to detect hallucination errors.
RCoT's effectiveness in detecting errors.
Limitations of Reversing Chain-of-Thought
Some limitations of RCoT include:
RCoT cannot detect all possible reasoning errors. For instance, RCoT struggles to detect computational errors.
There is a significant difference between RCoT-generated feedback and human-generated feedback.
RCoT involves multiple conversations with the LLM. This may consume additional resources (hardware, money, etc) and can also slow down the inference speed.
Conclusion
RCoT improves the reasoning abilities of LLMs by enabling them to detect and rectify factual inconsistencies in their generated solution. RCoT detects factual inconsistencies by fine-grained comparison between the original question and the reconstructed question using the original incorrect answer. It then reports the detected inconsistencies to the LLM through fine-grained feedback and guides the LLM to the correct solution.
Bhuwan Bhatt
Bhuwan Bhatt, a Machine Learning Engineer with over 5 years of industry experience, is passionate about solving complex challenges at the intersection of machine learning and Python programming. Bhuwan has contributed his expertise to leading companies, driving innovation in AI/ML projects. Beyond his professional endeavors, Bhuwan is deeply committed to sharing his knowledge and experiences with others in the field. He firmly believes in continuous improvement, striving to grow by 1% each day in both his technical skills and personal development.
Footnotes
Jason Wei. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
↩
Tianci Xue. (2023). RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought.
↩↩2↩3