Announcing our new Paper: The Prompt Report, with Co-authors from OpenAI & Microsoft!

Check it out →
🧠 AdvancedDecomposition🟦 Program-of-Thoughts

🟦 Program-of-Thoughts Prompting

Last updated on September 4, 2024 by Bhuwan Bhatt
Takeaways
  • Program-of-Thoughts (PoT) prompting separates reasoning from computation by delegating calculations to external interpreters like Python, reducing computational errors.
  • PoT vs. Chain-of-Thought (CoT): Unlike CoT, which merges reasoning and computation, PoT improves accuracy by using structured programs for complex tasks.
  • Tasks best suited for PoT: PoT excels in math word problems, financial reasoning, and cases where exact computation is required.
  • Limitations include the potential risks of executing generated code and reduced effectiveness on datasets with diverse question types like AQuA.

What is Program-of-Thoughts Prompting?

In Chain-of-Thoughts (CoT) prompting1, Large Language Models (LLMs) perform both reasoning and computations. The LLM generates mathematical expressions as a reasoning step and then solves it to get the final answer. However, LLMs are not the ideal candidate for solving mathematical expressions for three reasons:

  • LLMs are prone to computational errors
  • LLMs are incapable of solving complex mathematical expressions involving polynomial equations, differential equations, etc.
  • LLMs are highly inefficient at expressing iteration

Program-of-Thoughts (PoT) prompting2 is a prompting technique that separates reasoning from computation. To be specific, PoT delegates computation steps to an external language interpreter. For instance, the LLM can express the reasoning steps as a Python program, which can be delegated to a Python interpreter to get an accurate response.

The example input and output pair generated by employing CoT prompting and PoT prompting below illustrate how PoT differs from CoT.

Astronaut

Prompt


Question: In Fibonacci sequence, it follows the rule that each number is equal to the sum of the preceding two numbers. Assuming the first two numbers are 0 and 1, what is the 50th number in Fibonacci sequence?

  • A sample response generated using CoT prompting:
Robot

AI Output


The first number is 0, the second number is 1, therefore, the third number is 0+1=1. The fourth number is 1+1=2. The fifth number is 1+2=3. The sixth number is 2+3=5. The seventh number is 3+5=8. The eighth number is 5+8=13. ..... (Skip 1000 tokens) The 50th number is 32,432,268,459.

  • A sample response generated using PoT prompting:
Astronaut

Prompt


Question: In Fibonacci sequence, it follows the rule that each number is equal to the sum of the preceding two numbers.
Assuming the first two numbers are 0 and 1, what is the 50th number in Fibonacci sequence?

# Python code, return ans
Robot

AI Output


length_of_fibonacci_sequence = 50 fibonacci_sequence = np.zeros(length_of_fibonacci_sequence) fibonacci_sequence[0] = 0 fibonacci_ sequence[1] = 1 for i in range(3, length_of_fibonacci_sequence): fibonacci_sequence = fibonacci_sequence[i-1] + fibonacci_sequence[i-2] ans = fibonacci_sequence[-1]

It is clear that while CoT prompting generates natural language output, PoT prompting yields a Python program, which is executed in the Python interpreter to get the final response.

How to Use Program-of-Thoughts Prompting?

We can use PoT in either zero-shot or few-shot settings. In zero-shot PoT, the prompt doesn't include any exemplar.

The prompt generates a Python program as the output, which the Python interpreter executes to get the final output, i.e., 260.

def solver():
    seattle_sheep = 20
    charleston_sheep = seattle_sheep * 4
    toulouse_sheep = charleston_sheep * 2
    
    total_sheep = seattle_sheep + charleston_sheep + toulouse_sheep
    
    return total_sheep

# Now let's call the solver function and print the result
print(solver())

### OUTPUT
----
>>> 260

As expected, a few-shots PoT prompt requires exemplars demonstrating how to solve the problem. Like zero-shot, the output is a program that we separately execute using an interpreter to get the final output.

cost_of_original_house = 80000
increase_rate = 150 / 100
value_of_house = (1 + increase_rate) * cost_of_original_house
cost_of_repair = 50000
ans = value_of_house - cost_of_repair - cost_of_original_house
print(ans)
### OUTPUT
----
>>> 70000.0

PoT as an Intermediate Step

For problems requiring additional reasoning, PoT can also be utilized as an intermediate step to tackle the computation part. The code generated by PoT can be executed to get the intermediate result, which is then substituted with the original question to get the final answer using CoT.

PoT as an intermediate step2

What Are Program-of-Thoughts Prompting Results?

  • Zero-shot PoT outperforms zero-shot CoT across all Math Word Problems (MWP) datasets by a significant margin.

Comparasion of zero-shot PoT with similar techniques across MWP datasets2

  • On financial datasets, few-shot PoT + Self-Consistency(SC) decoding outperforms few-shot CoT + SC by roughly 20% on FinQA/ConvFinQA and 7% on TATQA. On MWP datasets, few-shot PoT + Self-Consistency(SC) decoding wins by a small margin of roughly 2-6%.
ModelGSM8KAQuASVAMPTabWMPFinQAConvFinTATQA
Codex CoT-SC78.052.086.875.444.447.963.2
PoT-SC-Codex80.058.689.181.868.167.370.2

Limitations of Program-of-Thoughts Prompting

There are two major limitations to PoT prompting:

  • PoT requires the execution of generated code. If the code is malicious and contains snippets like import os; os.rmdir(), it could harm the machine running the snippets. Code snippets could also be exploited to run SQL injection, which could either delete data or leak confidential data.
  • For datasets like AQuA, which contain a complex and wide variety of questions, PoT's performance suffers. The reason behind this could be that the exemplars cannot cover the diversity of questions in the dataset.

Conclusion

Program-of-Thought (PoT) separates computation from reasoning by having the LLM express reasoning as structured programs rather than natural language, improving accuracy for tasks that can be represented as code, like math or accounting. However, PoT is limited to such problems and carries the risk of executing malicious code from user prompts.

Footnotes

  1. Jason Wei. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.

  2. Wenhu Chen. (2022). Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks. 2 3

Word count: 0
Copyright © 2024 Learn Prompting.