LLMs that Reason and Act

🟦 This article is rated medium

Reading Time: 2 minutes

Last updated on August 7, 2024

ReAct(reason, act) is a paradigm for enabling language models to solve complex tasks using natural language reasoning. ReAct is designed for tasks in which the LLM is allowed to perform certain actions. For example, as in a MRKL system, a LLM may be able to interact with external APIs to retrieve information. When asked a question, the LLM could choose to perform an action to retrieve information, and then answer the question based on the retrieved information.

ReAct Systems can be thought of as MRKL systems, with the added ability to reason about the actions they can perform.

Examine the following image. The question in the top box is sourced from HotPotQA, a question answering dataset that requires complex reasoning. ReAct is able to answer the question by first reasoning about the question (Thought 1), and then performing an action (Act 1) to send a query to Google. It then receives an observation (Obs 1), and continues with this thought, action, observation loop until it reaches a conclusion (Act 3).

ReAct System (Yao et al.)

Readers with knowledge of reinforcement learning may recognize this process as similar to the classic RL loop of state, action, reward, state,... ReAct provides some formalization for this in their paper.

Results

Google used the PaLM LLM in experiments with ReAct. Comparisons to standard prompting (question only), CoT, and other configurations show that ReAct's performance is promising for complex reasoning tasks. Google also performs studies on the FEVER dataset, which covers fact extraction and verification.

ReAct Results (Yao et al.)

Sander Schulhoff

Sander Schulhoff is the CEO of HackAPrompt and Learn Prompting. He created the first Prompt Engineering guide on the internet, two months before ChatGPT was released, which has taught 3 million people how to prompt ChatGPT. He also partnered with OpenAI to run the first AI Red Teaming competition, HackAPrompt, which was 2x larger than the White House's subsequent AI Red Teaming competition. Today, HackAPrompt partners with the Frontier AI labs to produce research that makes their models more secure. Sander's background is in Natural Language Processing and deep reinforcement learning. He recently led the team behind The Prompt Report, the most comprehensive study of prompt engineering ever done. This 76-page survey, co-authored with OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions, analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ↩
Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. ↩
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., … Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways. ↩
Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018). FEVER: a large-scale dataset for Fact Extraction and VERification. ↩

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

AI Red-Teaming and AI Security Masterclass

Live AI Security Courses

LLMs that Reason and Act

Results

Sander Schulhoff

Footnotes