Last updated on October 1, 2024
In Chain-of-Thought (CoT) Prompting, Large Language Models (LLMs) perform both reasoning and computations. The LLM generates mathematical expressions as a reasoning step and then solves it to get the final answer. However, LLMs are not the ideal candidate for solving mathematical expressions for three reasons:
Program of Thoughts (PoT) prompting is a prompting technique that separates reasoning from computation. To be specific, PoT delegates computation steps to an external language interpreter. For instance, the LLM can express the reasoning steps as a Python program, which can be delegated to a Python interpreter to get an accurate response.
The example input and output pair generated by employing CoT prompting and PoT prompting below illustrate how PoT differs from CoT.
Question: In Fibonacci sequence, it follows the rule that each number is equal to the sum of the preceding two numbers. Assuming the first two numbers are 0 and 1, what is the 50th number in Fibonacci sequence?
The first number is 0, the second number is 1, therefore, the third number is 0+1=1. The fourth number is 1+1=2. The fifth number is 1+2=3. The sixth number is 2+3=5. The seventh number is 3+5=8. The eighth number is 5+8=13. ..... (Skip 1000 tokens) The 50th number is 32,432,268,459.
Question: In Fibonacci sequence, it follows the rule that each number is equal to the sum of the preceding two numbers. Assuming the first two numbers are 0 and 1, what is the 50th number in Fibonacci sequence?
# Python code, return anslength_of_fibonacci_sequence = 50 fibonacci_sequence = np.zeros(length_of_fibonacci_sequence) fibonacci_sequence[0] = 0 fibonacci_ sequence[1] = 1 for i in range(3, length_of_fibonacci_sequence): fibonacci_sequence = fibonacci_sequence[i-1] + fibonacci_sequence[i-2] ans = fibonacci_sequence[-1]
It is clear that while CoT prompting generates natural language output, PoT prompting yields a Python program, which is executed in the Python interpreter to get the final response.
We can use PoT in either Zero-Shot or Few-Shot Prompting settings. In Zero-Shot PoT, the prompt doesn't include any exemplar.
The prompt generates a Python program as the output, which the Python interpreter executes to get the final output, i.e., 260.
def solver():
seattle_sheep = 20
charleston_sheep = seattle_sheep * 4
toulouse_sheep = charleston_sheep * 2
total_sheep = seattle_sheep + charleston_sheep + toulouse_sheep
return total_sheep
# Now let's call the solver function and print the result
print(solver())
### OUTPUT
----
>>> 260
As expected, a Few-Shot PoT prompt requires exemplars demonstrating how to solve the problem. Like Zero-Shot, the output is a program that we separately execute using an interpreter to get the final output.
cost_of_original_house = 80000
increase_rate = 150 / 100
value_of_house = (1 + increase_rate) * cost_of_original_house
cost_of_repair = 50000
ans = value_of_house - cost_of_repair - cost_of_original_house
print(ans)
### OUTPUT
----
>>> 70000.0
For problems requiring additional reasoning, PoT can also be utilized as an intermediate step to tackle the computation part. The code generated by PoT can be executed to get the intermediate result, which is then substituted with the original question to get the final answer using Chain-of-Thought Prompting.
PoT as an intermediate step
Comparasion of Zero-Shot PoT with similar techniques across MWP datasets
Model | GSM8K | AQuA | SVAMP | TabWMP | FinQA | ConvFin | TATQA |
---|---|---|---|---|---|---|---|
Codex CoT-SC | 78.0 | 52.0 | 86.8 | 75.4 | 44.4 | 47.9 | 63.2 |
PoT-SC-Codex | 80.0 | 58.6 | 89.1 | 81.8 | 68.1 | 67.3 | 70.2 |
There are two major limitations to PoT prompting:
import os; os.rmdir()
, it could harm the machine running the snippets. Code snippets could also be exploited to run SQL injection, which could either delete data or leak confidential data.Program-of-Thought (PoT) separates computation from reasoning by having the LLM express reasoning as structured programs rather than natural language, improving accuracy for tasks that can be represented as code, like math or accounting. However, PoT is limited to such problems and carries the risk of executing malicious code from user prompts.