Announcing our new Course: AI Red-Teaming and AI Safety Masterclass

Check it out →
🧠 Advanced
🧠 AdvancedThought Generation🟦 Active Prompting

🟦 Active Prompting

Last updated on October 3, 2024 by Valeriia Kuka
Overview of Active Prompting

Information and Links

TechniqueInstitutionDate of PublicationPaperCode
Active PromptingThe Hong Kong University of Science and Technology, University of Toronto, The University of Hong Kong, University of Illinois Urbana-ChampaignFeb 2023Active Prompting with Chain-of-Thought for Large Language ModelsCode

What is Active Prompting?

Active Prompting (or Active-Prompt)1 is a technique for improving Chain-of-Thought (CoT) prompting performance by selectively human-annotating exemplars where the model shows the most uncertainty. This approach helps maximize the efficiency of human annotation efforts by focusing only on the most challenging questions for the model.

Active Prompting consists of four main steps:

  1. Uncertainty Estimation:
  • It prompts the model several times (kk) with unlabeled questions using Chain-of-Thought Prompting with a few human-written chain-of-thoughts or Zero-Shot Chain-of-Thought (CoT) prompting without human-written chain-of-thoughts to generate possible answers with intermediate steps for a set of unlabeled questions.
  • It calculates the uncertainty uu based on kk answers via a selected uncertainty metric.
  1. Selection: The questions with the highest uncertainty are selected for human annotation.
  2. Annotation: The selected questions from step 2 are manually annotated by humans to provide more accurate answers.
  3. Inference: The newly annotated exemplars are used to improve the model’s performance in answering future questions.

Active Prompting saves significant human resources by reducing the need to annotate all training data. It outperforms other techniques such as Automatic Chain-of-Thought prompting, Random Chain-of-Thought prompting, and Self-Consistency on a range of reasoning tasks. Active Prompting research1 is the first to show the benefits of selective question annotation in CoT prompting for solving complex reasoning tasks.

How to Use Active Prompting?

Let’s break down the Active Prompting process with an example. Assume you have a pool of nn unlabeled questions.

Step 1. Uncertainty Estimation

First, you prompt the model multiple times (kk) for each unlabeled question using:

  • A number of annotated examplars if you want to use CoT option
  • "Think step-by-step" if you want to use Zero-Shot CoT option

Let’s say you choose the CoT option. You provide exemplars (Q1 and Q2), then ask your pool question (Q3). Repeat this process kk times for each question.

Astronaut

Prompting k times with CoT option


Q1: Josh and Anna were both born on August 17th, but in different years. To consolidate celebrations they also got married on August 17 when Josh turned 22. If today they’re celebrating 30 years of marriage and their combined age is exactly 5 times what Josh’s age was when they married, how old was Anna when they got married?

A1: Let’s think step by step. To calculate how old was Anna when they got married, we have to know their combined age, Josh’s age after 30 years, and Anna’s age after 30 years from their marriage. Since their combined age is 5 times Josh’s age when he got married, their combined age is 5 * 22 = 110 years. Josh must be 30 years older than his age when they got married, so he is 22 + 30 = 52 years old now. Therefore, Anna’s current age will be 110 - 52 = 58 years. If they married 30 years ago, Anna must have been 58 - 30 = 28 years old when they married The answer is 28.

Q2: John buys a chair. He then buys a table that is 3 times the price of the chair. Then, he buys a couch that is 5 times the price of the table. If John paid $380 for all these items, what is the price of the couch?

A2: Let’s think step by step. To calculate the price of the couch, we need to know the price of the chair, the price of the table, and the relation between the chair, table, couch, and total money paid. Let x be the price of the chair, 3 * x be the price of the table, and 5 * (3 * x) = 15 * x be the price of the couch. The relationship between the chair, table, couch, and the total price paid is x + 3 * x + 15 * x = $380, which is 19 * x = 380, and x=20. The price of the couch is 15 * x, which is 15 * 20 = $300. The answer is 300.

Q3: John has 5 apples, and he gives 2 to Mary. How many apples does John have left?

As a result you will get kk answers for each of your nn questions.

Next, you need to measure the uncertainty of the model for each question based on the kk answers it generates for a given question.

To do that, you select the uncertainty metric. An example metric could be disagreement:

You use the disagreement among kk generated answers for a given question from the pool. The disagreement calculates the unique answers in the predictions.

  • You count the number of unique answers generated for a question, hh
  • Calculate the disagreement by dividing uncertainty=h/kuncertainty = h/k

Step 2. Selection

Then, you select the questions with the highest uncertainty based on the metric. For simplicity, let's review just one example question from the set of the most uncertain questions.

Imagine you take disagreement as an useartainty metric. You prompt the model kk times and find that the below question consistently yields different LLM's outputs meaning the model is uncertain in the answer:

Astronaut

Prompt


John has 5 apples, and he gives 2 to Mary.

How many apples does John have left?

This is just one example while in reality there can be many of them.

Step 3. Annotation

You manually annotate the selected question to provide a clear, correct answer:

Astronaut

Annotated Question


Q: John has 5 apples, and he gives 2 to Mary. How many apples does John have left?

A: John starts with 5 apples. He gives away 2 apples. Therefore, he has 5 - 2 = 3 apples left.

Step 4. Inference

This annotated question becomes an example for the model.

Tip

The code for Active Prompting is open-sourced and available for further research and implementation on GitHub.

What are the Experimental Results for Active Prompting?

Active Prompting has demonstrated superior performance across several benchmarks, including arithmetic, commonsense, and symbolic reasoning tasks. It consistently outperforms traditional CoT and other baseline techniques, highlighting its effectiveness in enhancing LLM capabilities.

  • Self-Consistency: Active-Prompt outperforms self-consistency across most tasks. Active-Prompt shows a 7.2% improvement over SC on arithmetic reasoning tasks. For commonsense and symbolic reasoning, Active-Prompt consistently outperforms SC across all tasks.
  • Chain-of-Thought: Compared to CoT, Active-Prompt demonstrates significant improvements. For instance, Active-Prompt scores 83.4% on GSM8K, compared to 63.1% by CoT. Across other datasets (e.g., ASDiv, SVAMP, AQUA), Active-Prompt outperforms CoT by margins ranging from 1.0% to 15.4%, showing the robustness of the active selection method.
  • Random Chain-of-Thought: Active-Prompt also shows consistent improvements over Random-CoT, with higher average performance across the datasets.
  • Automatic Chain-of-Thought: Though Auto-CoT produces decent results, particularly with code-davinci-002, Active-Prompt surpasses it in every task.

Limitations of Active Prompting

Despite its advantages, Active Prompting has some limitations:

  • Human Annotation Required: Some level of human involvement is needed to annotate the most uncertain questions.

  • Choosing the right uncertainty metric matters: The way we measure uncertainty can impact performance, so we need to pick the right one based on the task at hand.

Conclusion

Active prompting really enhances how well large language models solve complex reasoning problems. By focusing on the questions the model is most uncertain about, we make the annotation process efficient and tailor it to boost the model's learning.

Footnotes

  1. Diao, S., Wang, P., Lin, Y., Pan, R., Liu, X., & Zhang, T. (2024). Active Prompting with Chain-of-Thought for Large Language Models. https://arxiv.org/abs/2302.12246 2

Edit this page
Word count: 0
Copyright © 2024 Learn Prompting.