Announcing our new Course: AI Red-Teaming and AI Safety Masterclass
Check it out →One of the major limitations of present-day large language models (LLMs) is that they are limited by their context length. The table below shows the context limit of today's most advanced language models:
Model Name | Context Window |
---|---|
gpt-4o | 128,000 tokens |
gpt-3.5-turbo | 16,385 tokens |
Gemini 1.0 | 32,000 tokens |
As a result, for complex problems, the context length for Chain-of-Thought (CoT) prompting^{1} can grow exponentially and exceed the maximum allowed length.
Recursion of Thought (RoT) prompting^{2} is an inference framework that uses the divide and conquer strategy to divide a problem into multiple sub-problems, solve them in separate contexts, and aggregate the answer from each into a final answer.
Unlike other methods in this section, you cannot simply prompt the LLM to use recursive divide and conquer. For the Recursion of Thought to work, you must first train the model using a training dataset. You can train an LLM or any language model, as RoT is a model-agnostic framework.
Training Data sample for Recursion of Thought^{2}
During inference, RoT utilizes special tokens - GO
, THINK
, STOP
- for recursive context control. As expected, GO
and STOP
mark the start and end of a problem sequence. When the model needs to decompose the problem into a simpler version, the subproblem, and solve it, it produces a THINK
token. It then substitutes the solution to the subproblem back into the problem sequence. The process continues till we get the final solution.
The authors train GPT-3 using the RoT framework for 48-digit addition/subtraction and 16-digit multiplication/division. CoT cannot solve those problems due to the context limit of 2048 tokens in GPT-3. Consequently, there is no apples-to-apples comparison. However, GPT-3 trained with RoT achieves near-perfect accuracy in both tasks.
Results are similar for other problems such as Longest Common Subsequence (LCS), G.2.2 Longest Palindromic Subsequence (LPS), 0-1 knapsack, and Matrix Chain Multiplication (MCM). While CoT cannot solve them beyond a certain complexity, RoT achieves a perfect score.
There are a few limitations to RoT that make it difficult for people to adopt the framework:
For complex problems requiring a large context length, CoT length grows rapidly with increased complexity. Models like GPT-3 cannot accommodate such a huge context length. The RoT framework allows language models with limited context length to solve complex frameworks. However, RoT requires training the model using a labeled dataset before using it for inference. As such, RoT can be a good alternative to expensive and large language models like GPT-4.