Least-to-Most prompting (LtM) takes Chain-of-Thought Prompting (CoT) a step further by first breaking a problem into sub problems then solving each one. It is a technique inspired by real-world educational strategies for children.
As in Chain-of-Thought prompting, the problem to be solved is decomposed in a set of subproblems that build upon each other. In a second step, these subproblems are solved one by one. Contrary to Chain-of-Thought, the solution of previous subproblems is fed into the prompt trying to solve the next problem.
Let's ask a slightly complicated customer service question:
That failed (we are within the return time), so let's try to break it down into subproblems:
Let's try to solve the first subproblem:
By just solving the first subproblem, we were able to solve the entire problem. If GPT-3 did not return an answer immediately, we could have solved the next subproblem and so on until it did return an answer. Note that we use Let's go step by step.
. The addition of this phrase is not always necessary, but it helps for this example.
LtM was originally introduced using Few-Shot Prompting, rather than an explicit instruction to break down a problem into multiple steps (as seen above). Additionally, it can sometimes be implemented with a single prompt rather than chained prompts. Let's examine the problem of concatenating the last letter of individual words (for example, given cake, etymology
as input words, the output should be ey
).
The standard prompt with Few-Shot examples performs very poorly, even with a more advanced model such as text-davinci-003.
Chain-of-Thought performs significantly better than standard prompting. This is because it now allows the model to consider extracting the last letter of each word on its own, reducing the complexity down to the operation of grouping letters it has previously collected. However, this starts to break down at larger sizes.
With Least-to-Most prompting, we augment the Chain-of-Thought concept by reformulating the individual steps to restate the previously concatenated result. This simplifies each step to concatenating only a single new letter. This leads to good performance all the way to 12 or more words.
This approach may look very similar to Chain-of-Thought, but it is conceptually very different. Here, at every step, we introduce the previous concatenation. In the case of "think, machine, learning", instead of concatenating the letters "k", "e", and "g" individually, it will concatenate "k" and "e", then "ke" and "g". As a result of this reintroduction of the previous work, the model can now generalize to much longer chains because it carries the result incrementally along and only needs to do a small amount of work at each step.
On the last letter concatenation problem with 12 words, Chain-of-Thought is 34% accurate, while Least-to-Most is 74% accurate (the paper uses text-davinci-002 as a model).
The SCAN benchmark requires the model to convert natural language to sequences of actions. For example, the sentence "run left and walk twice" would be translated to "TURN_LEFT + RUN + WALK * 2". Language models perform especially poorly when confronted with sequences that are longer than those in the training set.
Using simple standard prompts, text-davinci-003 gets impressively far but still fails.
Here, we work with 2 different prompts. The first prompt is used to reduce the input problem into a simpler sequence of steps. The second prompt is used to map this simplified sequence of steps into actual actions.
Both prompts are pretty long and use compressed Python notation for the action to save on tokens.
The first step breaks the natural language description down in a more explicit, yet still human-like language. This will help the mapping step figure things out in sequence.
For example, "jump around left twice" is reduced to "jump left" -> TURN_LEFT + JUMP
and "jump around left" -> (TURN_LEFT + JUMP) * 4
. Similarly, the reduction step is what is used to explain the concept of repetition (twice, thrice, etc...).
In the second step, we use the output of the reduction, and again use a fairly long prompt (14 cases) to translate the reduced natural language description into a sequence of actions.
Here, we inject the output of the first step into the LLM:
LtM leads to multiple improvements:
Standard prompting with text-davinci-002 (the model used in the paper) results in 6% of successful SCAN problems solved, while Least-to-Most prompting results in an impressive 76% success rate. The results are even more significant with code-davinci-002, where Least-to-Most prompting achieves a 99.7% success rate.
Least-to-Most prompting helps generalize the Chain-of-Thought technique to more difficult inputs by breaking down complex problems into simpler subsections. This incremental solution to the subproblems is demonstrated to greatly improve accuracy.
Least-to-Most prompting improves upon Chain-of-Thought prompting because it breaks down a complex problem into simpler and more manageable subproblems. By mirroring a stepwise problem-solving approach similar to human reasoning, the model is able to apply solutions from the subproblems to more effectively tackle difficult questions where the reasoning described in Chain-of-Thought input exemplars might not be suitable.
Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.
Zhou, D., SchΓ€rli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. β©
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. β©
Lake, B. M., & Baroni, M. (2018). Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks. https://doi.org/10.48550/arXiv.1711.00350 β© β©2