Announcing our new Paper: The Prompt Report, with Co-authors from OpenAI & Microsoft!

Check it out →
🧙‍♂️ Weiterführendes🟦 Least to Most Prompting

🟦 Least to Most Prompting

Last updated on August 7, 2024 by Sander Schulhoff

Least to Most prompting (LtM)1 takes CoT prompting a step further by first breaking a problem into sub problems then solving each one. It is a technique inspired by real-world educational strategies for children.

As in CoT prompting, the problem to be solved is decomposed in a set of subproblems that build upon each other. In a second step, these subproblems are solved one by one. Contrary to chain of thought, the solution of previous subproblems is fed into the prompt trying to solve the next problem.

Diagram of a Least to Most prompting

Example: Customer Inquiry Response

Let's ask a slightly complicated customer service question:


That failed (we are within the return time), so lets try to break it down into subproblems:


Let's try to solve the first subproblem:

By just solving the first subproblem, we were able to solve the entire problem. If GPT-3 did not return an answer immediately, we could have solved the next subproblem and so on until it did return an answer. Note that we use Let's go step by step.. The addition of this phrase is not always necessary, but it helps for this example.

Example: letter concatenation

LtM was originally introduced using few-shot prompting, rather than an explicit instruction to break down a problem into multiple steps (as seen above). Additionally, it can sometimes be implemented with a single prompt rather than chained prompts. Let's examine the problem of concatenating the last letter of individual words2 (for example, given cake, etymology as input words, the output should be ey).

First attempt: Standard

The standard prompt with few-shot examples performs very poorly, even with a more advanced model such as text-davinci-003.

Second attempt: Chain of Thought

Chain of Thought performs significantly better than standard prompting. This is because it now allows the model to consider extracting the last letter of each word on its own, reducing the complexity down to the operation of grouping letters it has previously collected. However, this starts to break down at larger sizes.

Third attempt: Least to Most (single prompt)

With Least to Most prompting, we augment the Chain of Thought concept by reformulating the individual steps to restate the previously concatenated result. This simplifies each step to concatenating only a single new letter. This leads to good performance all the way to 12 or more words.

This approach may look very similar to Chain of Thought, but it is conceptually very different. Here, at every step, we introduce the previous concatenation. In the case of "think, machine, learning", instead of concatenating the letters "k", "e", "g" individually, it will concatenate "k" and "e", then "ke" and "g". As a result of this reintroduction of the previous work, the model can now generalize to much longer chains because it carries the result incrementally along and only needs to do a small amount of work at each step.

Results

On the last letter concatenation problem with 12 words, Chain of Thought is 34% accurate, while Least to Most is 74% accurate (the paper uses text-davinci-002 as a model).

Example: compositional generalization (SCAN)

The SCAN benchmark3 requires the model to convert natural language to sequences of actions. For example, the sentence "run left and walk twice" would be translated to "TURN_LEFT + RUN + WALK * 2". Language models perform especially poorly when confronted with sequences that are longer than those in the training set.

First attempt: Standard prompting

Using simple standard prompts, text-davinci-003 gets impressively far, but still fails.

Second attempt: Least to Most, first step - Reduction

Here, we work with 2 different prompts. The first prompt is used to reduce the input problem into a simpler sequence of steps. The second prompt is used to map this simplified sequence of steps into actual actions.

Both prompts are pretty long, and use compressed python notation for the action to save on tokens.

The first step breaks the natural language description down in a more explicit, yet still human-like language. This will help the mapping step figure things out in sequence. For example, "jump around left twice" is reduced to "jump left" -> TURN_LEFT + JUMP and "jump around left" -> (TURN_LEFT + JUMP) * 4. Similarly, the reduction step is what is used to explain the concept of repetition (twice, thrice, etc...).

Second attempt: Least to Most, second step - Mapping

In the second step, we use the output of the reduction, and again use a fairly long prompt (14 cases) to translate the reduced natural language description into a sequence of actions.

Here, we inject the output of the first step:

"jump around left twice" can be solved by: "jump left", "jump around left", "jump around left twice". "walk opposite left thrice" can be solved by: "walk opposite left", "walk opposite left thrice". So, "jump around left twice after walk opposite left thrice" can be solved by: "jump left", "jump around left", "jump around left twice", "walk opposite left", "walk opposite left thrice".

into the LLM.

Results

LtM leads to multiple improvements:

  • improved accuracy over Chain of Thought
  • increased generalization on problems harder than those in the prompt
  • dramatically improved performance in compositional generalization, in particular the SCAN benchmark3

Standard prompting with text-davinci-002 (the model used in the paper) results in 6% of successful SCAN problems solved, while Least to Most prompting results in an impressive 76% success rate. The results are event more significant with code-davinci-002, where Least to Most prompting achieves a 99.7% success rate.

Footnotes

  1. Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.

  2. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models.

  3. Lake, B. M., & Baroni, M. (2018). Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks. https://doi.org/10.48550/arXiv.1711.00350 2

Word count: 0
Copyright © 2024 Learn Prompting.