Program of Thoughts (PoT) Prompting separates reasoning from computation by delegating calculations to external interpreters like Python, reducing computational errors.
PoT vs. Chain-of-Thought (CoT): Unlike CoT, which merges reasoning and computation, PoT improves accuracy by using structured programs for complex tasks.
Tasks best suited for PoT: PoT excels in math word problems, financial reasoning, and cases where exact computation is required.
Limitations include the potential risks of executing generated code and reduced effectiveness on datasets with diverse question types like AQuA.

What is Program of Thoughts Prompting?

In Chain-of-Thought (CoT) Prompting^{1Jason Wei. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.}, Large Language Models (LLMs) perform both reasoning and computations. The LLM generates mathematical expressions as a reasoning step and then solves it to get the final answer. However, LLMs are not the ideal candidate for solving mathematical expressions for three reasons:

LLMs are prone to computational errors.
LLMs are incapable of solving complex mathematical expressions involving polynomial equations, differential equations, etc.
LLMs are highly inefficient at expressing iteration.

Program of Thoughts (PoT) prompting^{2Wenhu Chen. (2022). Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks.} is a prompting technique that separates reasoning from computation. To be specific, PoT delegates computation steps to an external language interpreter. For instance, the LLM can express the reasoning steps as a Python program, which can be delegated to a Python interpreter to get an accurate response.

The example input and output pair generated by employing CoT prompting and PoT prompting below illustrate how PoT differs from CoT.

Prompt

Question: In Fibonacci sequence, it follows the rule that each number is equal to the sum of the preceding two numbers. Assuming the first two numbers are 0 and 1, what is the 50th number in Fibonacci sequence?

A sample response generated using CoT prompting:

AI Output

The first number is 0, the second number is 1, therefore, the third number is 0+1=1. The fourth number is 1+1=2. The fifth number is 1+2=3. The sixth number is 2+3=5. The seventh number is 3+5=8. The eighth number is 5+8=13. ..... (Skip 1000 tokens) The 50th number is 32,432,268,459.

A sample response generated using PoT prompting:

Prompt

# Python code, return ans

AI Output

length_of_fibonacci_sequence = 50 fibonacci_sequence = np.zeros(length_of_fibonacci_sequence) fibonacci_sequence[0] = 0 fibonacci_ sequence[1] = 1 for i in range(3, length_of_fibonacci_sequence): fibonacci_sequence = fibonacci_sequence[i-1] + fibonacci_sequence[i-2] ans = fibonacci_sequence[-1]

It is clear that while CoT prompting generates natural language output, PoT prompting yields a Python program, which is executed in the Python interpreter to get the final response.

How to Use Program of Thoughts Prompting?

We can use PoT in either Zero-Shot or Few-Shot Prompting settings. In Zero-Shot PoT, the prompt doesn't include any exemplar.

The prompt generates a Python program as the output, which the Python interpreter executes to get the final output, i.e., 260.

def solver():
    seattle_sheep = 20
    charleston_sheep = seattle_sheep * 4
    toulouse_sheep = charleston_sheep * 2
    
    total_sheep = seattle_sheep + charleston_sheep + toulouse_sheep
    
    return total_sheep

# Now let's call the solver function and print the result
print(solver())

### OUTPUT
----
>>> 260

As expected, a Few-Shot PoT prompt requires exemplars demonstrating how to solve the problem. Like Zero-Shot, the output is a program that we separately execute using an interpreter to get the final output.

cost_of_original_house = 80000
increase_rate = 150 / 100
value_of_house = (1 + increase_rate) * cost_of_original_house
cost_of_repair = 50000
ans = value_of_house - cost_of_repair - cost_of_original_house
print(ans)
### OUTPUT
----
>>> 70000.0

PoT as an Intermediate Step

For problems requiring additional reasoning, PoT can also be utilized as an intermediate step to tackle the computation part. The code generated by PoT can be executed to get the intermediate result, which is then substituted with the original question to get the final answer using Chain-of-Thought Prompting.

using Program-as-Thought as an intermediate step

PoT as an intermediate step^{2Wenhu Chen. (2022). Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks.}

What Are Program of Thoughts Prompting Results?

Zero-Shot PoT outperforms Zero-Shot Chain-of-Thought across all math word problems (MWP) datasets by a significant margin.

Comparasion of Zero-Shot PoT with similar techniques across MWP datasets^{2Wenhu Chen. (2022). Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks.}

On financial datasets, Few-Shot PoT + Self-Consistency (SC) decoding outperforms Few-Shot CoT + SC by roughly 20% on FinQA/ConvFinQA and 7% on TATQA. On MWP datasets, Few-Shot PoT + Self-Consistency (SC) decoding wins by a small margin of roughly 2-6%.

Model	GSM8K	AQuA	SVAMP	TabWMP	FinQA	ConvFin	TATQA
Codex CoT-SC	78.0	52.0	86.8	75.4	44.4	47.9	63.2
PoT-SC-Codex	80.0	58.6	89.1	81.8	68.1	67.3	70.2

Limitations of Program of Thoughts Prompting

There are two major limitations to PoT prompting:

PoT requires the execution of generated code. If the code is malicious and contains snippets like import os; os.rmdir(), it could harm the machine running the snippets. Code snippets could also be exploited to run SQL injection, which could either delete data or leak confidential data.
For datasets like AQuA, which contain a complex and wide variety of questions, PoT's performance suffers. The reason behind this could be that the exemplars cannot cover the diversity of questions in the dataset.

Conclusion

Program-of-Thought (PoT) separates computation from reasoning by having the LLM express reasoning as structured programs rather than natural language, improving accuracy for tasks that can be represented as code, like math or accounting. However, PoT is limited to such problems and carries the risk of executing malicious code from user prompts.

Find more on Decomposition Prompting methods.

Footnotes

Jason Wei. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. ↩
Wenhu Chen. (2022). Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks. ↩ ↩² ↩³

Valeriia Kuka

Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.

Edit this page

🟦 Plan-and-Solve Prompting

🟦 Tree of Thoughts

Master Generative AI with Our Courses

Need Business GenAI Training?

Contact Sales

Want to keep learning

Explore Our Full Course Collection

On this page

What is Program of Thoughts Prompting?
How to Use Program of Thoughts Prompting?
What Are Program of Thoughts Prompting Results?
Limitations of Program of Thoughts Prompting
Conclusion

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses

🟦 Program of Thoughts Prompting

What is Program of Thoughts Prompting?

Prompt

AI Output

Prompt

AI Output

How to Use Program of Thoughts Prompting?

PoT as an Intermediate Step

What Are Program of Thoughts Prompting Results?

Limitations of Program of Thoughts Prompting

Conclusion

Footnotes

Valeriia Kuka

Master Generative AI with Our Courses

Contact Sales

Explore Our Full Course Collection

Explore Courses

Resources

Follow Us