Chain-of-Thought (CoT) Prompting helps to improve the accuracy of Large Language Models (LLMs) on arithmetic reasoning tasks by forcing the model to think step by step before answering the question. However, LLMs still struggle to maintain factual consistency during reasoning and seem to overlook conditions, misinterpret the original questions, and may hallucinate. This is especially true when the original question contains multiple conditions. The example below shows an instance where ChatGPT overlooks the condition that the provided date in the question is tomorrow's date.
Today's meeting is rescheduled to 11 am tomorrow, 10/16/1924. What is the date one year ago from today?
The current date is 10/16/1924. To find the date one year ago from today, you would subtract one year from the current year, which would be 1923. The correct answer is 10/16/1923. β
Reversing Chain-of-Thought (RCoT) prompting aims to enhance LLMs' reasoning abilities by detecting and rectifying condition hallucinations. RCoT employs a 3 step process to detect and rectify hallucinations:
Let's use RCoT to solve a math word problem. The dialog below shows the question and initial LLM response:
Step 1: Problem Reconstruction
Reconstruct the problem by providing the question and initial response to the LLM.
As you can see, the reconstructed problem has different facts than the ones stated in the original problem:
Step 2 (i): Problem Decomposition
Now, let's decompose both problems into a list of conditions and facts stated in the problem.
Step 2 (ii): Condition Comparison
Detect hallucinated and overlooked conditions from the condition list.
Step 2 (iii): Question comparison
Compare the questions to check if the LLM can detect the difference in facts within the questions.
Step 3: Fine-grained feedback and revision
Collect all factual inconsistencies, provide feedback to the LLM, and instruct it to revise its solution.
RCoT Zero-Shot results
RCoT's effectiveness in detecting errors.
Some limitations of RCoT include:
RCoT improves the reasoning abilities of LLMs by enabling them to detect and rectify factual inconsistencies in their generated solution. RCoT detects factual inconsistencies by fine-grained comparison between the original question and the reconstructed question using the original incorrect answer. It then reports the detected inconsistencies to the LLM through fine-grained feedback and guides the LLM to the correct solution.
Bhuwan Bhatt, a Machine Learning Engineer with over 5 years of industry experience, is passionate about solving complex challenges at the intersection of machine learning and Python programming. Bhuwan has contributed his expertise to leading companies, driving innovation in AI/ML projects. Beyond his professional endeavors, Bhuwan is deeply committed to sharing his knowledge and experiences with others in the field. He firmly believes in continuous improvement, striving to grow by 1% each day in both his technical skills and personal development.