Least to Most Prompting

🟦 This article is rated medium

Reading Time: 4 minutes

Last updated on August 7, 2024

Least to Most prompting (LtM) takes CoT prompting a step further by first breaking a problem into sub problems then solving each one. It is a technique inspired by real-world educational strategies for children.

As in CoT prompting, the problem to be solved is decomposed in a set of subproblems that build upon each other. In a second step, these subproblems are solved one by one. Contrary to chain of thought, the solution of previous subproblems is fed into the prompt trying to solve the next problem.

Diagram of a Least to Most prompting

Example: Customer Inquiry Response

Let's ask a slightly complicated customer service question:

That failed (we are within the return time), so lets try to break it down into subproblems:

Let's try to solve the first subproblem:

By just solving the first subproblem, we were able to solve the entire problem. If GPT-3 did not return an answer immediately, we could have solved the next subproblem and so on until it did return an answer. Note that we use Let's go step by step.. The addition of this phrase is not always necessary, but it helps for this example.

Example: letter concatenation

LtM was originally introduced using few-shot prompting, rather than an explicit instruction to break down a problem into multiple steps (as seen above). Additionally, it can sometimes be implemented with a single prompt rather than chained prompts. Let's examine the problem of concatenating the last letter of individual words (for example, given cake, etymology as input words, the output should be ey).

First attempt: Standard

The standard prompt with few-shot examples performs very poorly, even with a more advanced model such as text-davinci-003.

Second attempt: Chain of Thought

Chain of Thought performs significantly better than standard prompting. This is because it now allows the model to consider extracting the last letter of each word on its own, reducing the complexity down to the operation of grouping letters it has previously collected. However, this starts to break down at larger sizes.

Third attempt: Least to Most (single prompt)

With Least to Most prompting, we augment the Chain of Thought concept by reformulating the individual steps to restate the previously concatenated result. This simplifies each step to concatenating only a single new letter. This leads to good performance all the way to 12 or more words.

This approach may look very similar to Chain of Thought, but it is conceptually very different. Here, at every step, we introduce the previous concatenation. In the case of "think, machine, learning", instead of concatenating the letters "k", "e", "g" individually, it will concatenate "k" and "e", then "ke" and "g". As a result of this reintroduction of the previous work, the model can now generalize to much longer chains because it carries the result incrementally along and only needs to do a small amount of work at each step.

Results

On the last letter concatenation problem with 12 words, Chain of Thought is 34% accurate, while Least to Most is 74% accurate (the paper uses text-davinci-002 as a model).

Example: compositional generalization (SCAN)

The SCAN benchmark requires the model to convert natural language to sequences of actions. For example, the sentence "run left and walk twice" would be translated to "TURN_LEFT + RUN + WALK * 2". Language models perform especially poorly when confronted with sequences that are longer than those in the training set.

First attempt: Standard prompting

Using simple standard prompts, text-davinci-003 gets impressively far, but still fails.

Second attempt: Least to Most, first step - Reduction

Here, we work with 2 different prompts. The first prompt is used to reduce the input problem into a simpler sequence of steps. The second prompt is used to map this simplified sequence of steps into actual actions.

Both prompts are pretty long, and use compressed python notation for the action to save on tokens.

The first step breaks the natural language description down in a more explicit, yet still human-like language. This will help the mapping step figure things out in sequence. For example, "jump around left twice" is reduced to "jump left" -> TURN_LEFT + JUMP and "jump around left" -> (TURN_LEFT + JUMP) * 4. Similarly, the reduction step is what is used to explain the concept of repetition (twice, thrice, etc...).

Second attempt: Least to Most, second step - Mapping

In the second step, we use the output of the reduction, and again use a fairly long prompt (14 cases) to translate the reduced natural language description into a sequence of actions.

Here, we inject the output of the first step:

"jump around left twice" can be solved by: "jump left", "jump around left", "jump around left twice". "walk opposite left thrice" can be solved by: "walk opposite left", "walk opposite left thrice". So, "jump around left twice after walk opposite left thrice" can be solved by: "jump left", "jump around left", "jump around left twice", "walk opposite left", "walk opposite left thrice".

into the LLM.

Results

LtM leads to multiple improvements:

improved accuracy over Chain of Thought
increased generalization on problems harder than those in the prompt
dramatically improved performance in compositional generalization, in particular the SCAN benchmark

Standard prompting with text-davinci-002 (the model used in the paper) results in 6% of successful SCAN problems solved, while Least to Most prompting results in an impressive 76% success rate. The results are event more significant with code-davinci-002, where Least to Most prompting achieves a 99.7% success rate.

Sander Schulhoff

Sander Schulhoff is the CEO of HackAPrompt and Learn Prompting. He created the first Prompt Engineering guide on the internet, two months before ChatGPT was released, which has taught 3 million people how to prompt ChatGPT. He also partnered with OpenAI to run the first AI Red Teaming competition, HackAPrompt, which was 2x larger than the White House's subsequent AI Red Teaming competition. Today, HackAPrompt partners with the Frontier AI labs to produce research that makes their models more secure. Sander's background is in Natural Language Processing and deep reinforcement learning. He recently led the team behind The Prompt Report, the most comprehensive study of prompt engineering ever done. This 76-page survey, co-authored with OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions, analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. ↩
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. ↩
Lake, B. M., & Baroni, M. (2018). Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks. https://doi.org/10.48550/arXiv.1711.00350 ↩ ↩²

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

AI Red-Teaming and AI Security Masterclass

Live AI Security Courses

Least to Most Prompting

Example: Customer Inquiry Response

Example: letter concatenation

First attempt: Standard

Second attempt: Chain of Thought

Third attempt: Least to Most (single prompt)

Results

Example: compositional generalization (SCAN)

First attempt: Standard prompting

Second attempt: Least to Most, first step - Reduction

Second attempt: Least to Most, second step - Mapping

Results

Sander Schulhoff

Footnotes