**Announcing our new Course: AI Red-Teaming and AI Safety Masterclass**

😃 Basics

💼 Applications

🧙♂️ Intermediate

🧠 Advanced

Zero-Shot

Few-Shot

Thought Generation

Ensembling

Self-Criticism

🌱 New Techniques

👀 For Vision-Language Models (VLMs)

🔀 For Multimodal Large Language Models (MLLMs)

⚖️ Reliability

🖼️ Image Prompting

🔓 Prompt Hacking

🟢 Defensive Measures

🔨 Tooling

💪 Prompt Tuning

📝 Language Models

Takeaways

**Program of Thoughts (PoT) prompting**separates reasoning from computation by delegating calculations to external interpreters like Python, reducing computational errors.**PoT vs. Chain-of-Thought (CoT)**: Unlike CoT, which merges reasoning and computation, PoT improves accuracy by using structured programs for complex tasks.**Tasks best suited for PoT**: PoT excels in math word problems, financial reasoning, and cases where exact computation is required.**Limitations**include the potential risks of executing generated code and reduced effectiveness on datasets with diverse question types like AQuA.

In Chain-of-Thoughts (CoT) prompting, large language models (LLMs) perform both reasoning and computations^{1}. The LLM generates mathematical expressions as a reasoning step and then solves it to get the final answer. However, LLMs are not the ideal candidate for solving mathematical expressions for three reasons:

- LLMs are prone to computational errors
- LLMs are incapable of solving complex mathematical expressions involving polynomial equations, differential equations, etc.
- LLMs are highly inefficient at expressing iteration

**Program of Thoughts (PoT) prompting**^{2} is a **prompting technique that separates reasoning from computation**. To be specific, PoT delegates computation steps to an external language interpreter. For instance, the LLM can express the reasoning steps as a Python program, which can be delegated to a Python interpreter to get an accurate response.

The example input and output pair generated by employing CoT prompting and PoT prompting below illustrate how PoT differs from CoT.

Question: In Fibonacci sequence, it follows the rule that each number is equal to the sum of the preceding two numbers. Assuming the first two numbers are 0 and 1, what is the 50th number in Fibonacci sequence?

- A sample response generated using CoT prompting:

The first number is 0, the second number is 1, therefore, the third number is 0+1=1. The fourth number is 1+1=2. The fifth number is 1+2=3. The sixth number is 2+3=5. The seventh number is 3+5=8. The eighth number is 5+8=13. ..... (Skip 1000 tokens) The 50th number is 32,432,268,459.

- A sample response generated using PoT prompting:

Question: In Fibonacci sequence, it follows the rule that each number is equal to the sum of the preceding two numbers. Assuming the first two numbers are 0 and 1, what is the 50th number in Fibonacci sequence?

# Python code, return ans

length_of_fibonacci_sequence = 50 fibonacci_sequence = np.zeros(length_of_fibonacci_sequence) fibonacci_sequence[0] = 0 fibonacci_ sequence[1] = 1 for i in range(3, length_of_fibonacci_sequence): fibonacci_sequence = fibonacci_sequence[i-1] + fibonacci_sequence[i-2] ans = fibonacci_sequence[-1]

It is clear that while CoT prompting generates natural language output, PoT prompting yields a Python program, which is executed in the Python interpreter to get the final response.

We can use PoT in either Zero-Shot or Few-Shot settings. In Zero-Shot PoT, the prompt doesn't include any exemplar.

The prompt generates a Python program as the output, which the Python interpreter executes to get the final output, i.e., 260.

```
def solver():
seattle_sheep = 20
charleston_sheep = seattle_sheep * 4
toulouse_sheep = charleston_sheep * 2
total_sheep = seattle_sheep + charleston_sheep + toulouse_sheep
return total_sheep
# Now let's call the solver function and print the result
print(solver())
### OUTPUT
----
>>> 260
```

As expected, a Few-Shot PoT prompt requires exemplars demonstrating how to solve the problem. Like zero-shot, the output is a program that we separately execute using an interpreter to get the final output.

```
cost_of_original_house = 80000
increase_rate = 150 / 100
value_of_house = (1 + increase_rate) * cost_of_original_house
cost_of_repair = 50000
ans = value_of_house - cost_of_repair - cost_of_original_house
print(ans)
### OUTPUT
----
>>> 70000.0
```

For problems requiring additional reasoning, PoT can also be utilized as an intermediate step to tackle the computation part. The code generated by PoT can be executed to get the intermediate result, which is then substituted with the original question to get the final answer using Chain-of-Thought Prompting.

PoT as an intermediate step^{2}

- Zero-shot PoT outperforms Zero-Shot Chain-of-Thought across all math word problems (MWP) datasets by a significant margin.

Comparasion of Zero-Shot PoT with similar techniques across MWP datasets^{2}

- On financial datasets, few-shot PoT + Self-Consistency(SC) decoding outperforms few-shot CoT + SC by roughly 20% on FinQA/ConvFinQA and 7% on TATQA. On MWP datasets, few-shot PoT + Self-Consistency(SC) decoding wins by a small margin of roughly 2-6%.

Model | GSM8K | AQuA | SVAMP | TabWMP | FinQA | ConvFin | TATQA |
---|---|---|---|---|---|---|---|

Codex CoT-SC | 78.0 | 52.0 | 86.8 | 75.4 | 44.4 | 47.9 | 63.2 |

PoT-SC-Codex | 80.0 | 58.6 | 89.1 | 81.8 | 68.1 | 67.3 | 70.2 |

There are two major limitations to PoT prompting:

- PoT requires the execution of generated code. If the code is malicious and contains snippets like
`import os; os.rmdir()`

, it could harm the machine running the snippets. Code snippets could also be exploited to run SQL injection, which could either delete data or leak confidential data. - For datasets like AQuA, which contain a complex and wide variety of questions, PoT's performance suffers. The reason behind this could be that the exemplars cannot cover the diversity of questions in the dataset.

Program-of-Thought (PoT) separates computation from reasoning by having the LLM express reasoning as structured programs rather than natural language, improving accuracy for tasks that can be represented as code, like math or accounting. However, PoT is limited to such problems and carries the risk of executing malicious code from user prompts.

Word count: 0

Copyright © 2024 Learn Prompting.