Large Language Models (LLMs) have made remarkable advances in a variety of applications, including language translation, problem-solving, etc., but they still struggle to provide stable and accurate answers for complex tasks. For instance, LLMs frequently make errors when solving high school maths problems.
An instance where an LLM fails to correctly predict the answer
LLMs are fast and intuitive thinkers and lack the ability to slower more deliberate thought processes. Chain-of-Thought (CoT) Prompting and Tree-of-Thought (ToT) Prompting help guide LLM through a more structured reasoning process. Still, they can't instill the ability to dynamically store and leverage intermediate results.
Cumulative Reasoning (CR) Prompting uses three key roles: the proposer, verifier(s), and reporter. They work together to suggest, check, and compile the reasoning steps into a complete solution.
In CR prompting, the proposer initiates the reasoning process by suggesting actions to take. The verifier evaluates them and decides whether or not the proposer's suggestion will lead to a valid conclusion. If the verifier thinks the proposer's suggestions do not lead to a valid conclusion, the proposer provides new suggestions. The process continues till the verifier thinks that the proposer's suggestions can lead to a valid conclusion. At this point, the report combines the reasoning steps to come up with a final solution.
Let's use CR to solve the Game of 24. In this game, the goal is to use basic math operations and the given four numbers to get 24 as the final result. For this example, let's take 4, 9, 10, and 13 as the four input numbers.
CR's accuracy in Game of 24 compared to other models
CR's accuracy in the FOLIO wiki dataset
Cumulative Reasoning mimics humans' deliberate thought processes when solving complex tasks. By using a proposer, verifier(s), and reporter, CR decomposes the complex problem into smaller, easy-to-solve tasks and combines individual solutions into a comprehensive solution, demonstrating CR's potential to advance LLM's capabilities for solving complex tasks.
Bhuwan Bhatt, a Machine Learning Engineer with over 5 years of industry experience, is passionate about solving complex challenges at the intersection of machine learning and Python programming. Bhuwan has contributed his expertise to leading companies, driving innovation in AI/ML projects. Beyond his professional endeavors, Bhuwan is deeply committed to sharing his knowledge and experiences with others in the field. He firmly believes in continuous improvement, striving to grow by 1% each day in both his technical skills and personal development.
Yifan Zhang. (2023). Cumulative Reasoning with Large Language Models. https://arxiv.org/abs/2308.04371 β© β©2 β©3 β©4
Jason Wei. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. β©
Shunyu Yao. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. β©