Announcing our new Course: AI Red-Teaming and AI Safety Masterclass

Check it out →
🧠 Advanced

🟦 Consistency-based Self-adaptive Prompting (COSP)

Last updated on October 3, 2024 by Valeriia Kuka
Overview of Consistency-based Self-adaptive Prompting (COSP)1

What is COSP?

Consistency-based Self-Adaptive Prompting (COSP)1 is a novel technique designed to improve the reasoning capabilities of large language models (LLMs) in zero-shot settings. LLMs have demonstrated impressive abilities, but their performance in reasoning tasks often varies significantly depending on the approach used.

Two main methods—few-shot prompting (providing the model with handpicked examples) and zero-shot chain-of-thought (CoT) prompting (triggering step-by-step reasoning)—have shown success, but each has limitations. COSP addresses these by automatically selecting useful in-context examples from the LLM’s own generated responses, without needing labeled data or handcrafted prompts.

Why is COSP Needed?

  • Few-shot prompting requires careful selection of examples, which is time-consuming and task-specific.
  • Zero-shot CoT prompting often underperforms due to lack of guidance, leading to spurious reasoning paths.
  • COSP solves these problems by leveraging the model's own generated outputs, selecting the most useful examples based on consistency, diversity, and minimal repetition.

How COSP Differs from Existing Techniques

Zero-Shot CoT vs. COSP

While Zero-shot CoT relies on trigger phrases alone to prompt the model, COSP takes this further by:

  • Generating multiple outputs for each question.
  • Selecting the best in-context examples based on consistency and diversity.
  • Using majority voting to improve the reliability of the final answer.

Few-Shot CoT vs. COSP

Few-shot CoT requires manually selecting a small number of example questions and answers to guide the model. This is effective but labor-intensive and not scalable across tasks. COSP achieves similar or better performance without any labeled data, automatically selecting relevant in-context examples from the model's own outputs.

Benefits and Applications

  • Higher accuracy: COSP consistently outperforms zero-shot and even few-shot baselines on reasoning tasks, improving accuracy by up to 15%.
  • No labeled data needed: COSP works without any labeled data or handcrafted examples, making it scalable and efficient.
  • Consistency-driven: By focusing on consistency and diversity, COSP improves the reliability of predictions in zero-shot scenarios.

How COSP Works

  • Stage 1: Generating Responses: The model generates multiple reasoning paths for each test question using Zero-shot CoT. These paths are then assessed for their reliability based on consistency across answers.
  • Stage 2: Selecting Demonstrations: The best responses are selected as in-context demonstrations based on criteria like consistency (whether the same answer is repeated), minimal repetition, and diversity of reasoning paths. These selected examples are then used to guide the model’s final prediction.
  • Majority Vote: The final prediction is chosen by majority vote across multiple generated reasoning paths.

How to Use COSP

COSP can be applied to any reasoning-based task where zero-shot performance is needed. It requires:

  • Access to a large language model
  • Unlabeled test questions

The system will then follow these steps:

1. Generate responses

Run the model multiple times on each test question using zero-shot CoT.

Astronaut

Prompt


[Test question]

Let's think step by step.

For example:

Astronaut

Prompt


Henry had 11 dollars. For his birthday, he got 18 more dollars but spent 10 on a new game. How much money does he have now?

Let's think step by step.

AI Outputs for this example:

1. "11 + 18 = 29, 29 - 10 = 19"
2. "Henry has 27 dollars."
3. "He has 11 + 18 - 10 = 19."
4. "He bought 11 games and added 18. Then subtracted 10 for the game."

2. Select demonstrations

COSP automatically selects the most consistent and diverse reasoning paths to use as in-context examples.

Selected Example:

"11 + 18 = 29, 29 - 10 = 19."

3. Final prediction

COSP uses the selected examples to prompt the model again and choose the best answer based on a majority vote.

Final Answer: "19 dollars."

Results of COSP

COSP significantly improves performance on reasoning tasks when compared to zero-shot and even few-shot CoT. Here are the results from COSP applied to multiple reasoning benchmarks:

TaskZero-shot CoTFew-shot CoTCOSP
MultiArith67.2%81.0%85.0%
AddSub69.1%72.4%78.9%
GSM-8K20.9%30.3%30.2%
StrategyQA57.2%67.9%64.7%

Conclusion

COSP offers a powerful solution for improving zero-shot reasoning in LLMs. It removes the need for manual example crafting, instead relying on the model’s own outputs to guide reasoning. By combining consistency, diversity, and repetition analysis, COSP leads to significant performance improvements across multiple reasoning tasks. This makes it a scalable and efficient approach for improving LLM reasoning in real-world applications.

Footnotes

  1. Wan, X., Sun, R., Dai, H., Arik, S. O., & Pfister, T. (2023). Better Zero-Shot Reasoning with Self-Adaptive Prompting. https://arxiv.org/abs/2305.14106 2

Edit this page
Word count: 0
Copyright © 2024 Learn Prompting.