Consistency-based Self-Adaptive Prompting (COSP) is a novel technique designed to improve the reasoning capabilities of Large Language Models (LLMs) in Zero-Shot settings. LLMs have demonstrated impressive abilities, but their performance in reasoning tasks often varies significantly depending on the approach used.

Two main methods—Few-Shot Prompting (providing the model with handpicked examples) and Zero-Shot Chain-of-Thought (CoT) prompting (triggering step-by-step reasoning)—have shown success, but each has limitations. COSP addresses these by automatically selecting useful in-context examples from the LLM’s own generated responses, without needing labeled data or handcrafted prompts.

Why is COSP Needed?

Few-Shot prompting requires careful selection of examples, which is time-consuming and task-specific.
Zero-Shot CoT prompting often underperforms due to lack of guidance, leading to spurious reasoning paths.
COSP solves these problems by leveraging the model's own generated outputs, selecting the most useful examples based on consistency, diversity, and minimal repetition.

How COSP Differs from Existing Techniques

Zero-Shot CoT vs. COSP

While Zero-Shot CoT relies on trigger phrases alone to prompt the model, COSP takes this further by:

Generating multiple outputs for each question.
Selecting the best in-context examples based on consistency and diversity.
Using majority voting to improve the reliability of the final answer.

Few-Shot CoT vs. COSP

Few-Shot CoT requires manually selecting a small number of example questions and answers to guide the model. This is effective but labor-intensive and not scalable across tasks. COSP achieves similar or better performance without any labeled data, automatically selecting relevant in-context examples from the model's own outputs.

Benefits and Applications

Higher accuracy: COSP consistently outperforms Zero-Shot and even Few-Shot baselines on reasoning tasks, improving accuracy by up to 15%.
No labeled data needed: COSP works without any labeled data or handcrafted examples, making it scalable and efficient.
Consistency-driven: By focusing on consistency and diversity, COSP improves the reliability of predictions in Zero-Shot scenarios.

How COSP Works

Stage 1: Generating Responses: The model generates multiple reasoning paths for each test question using Zero-Shot CoT. These paths are then assessed for their reliability based on consistency across answers.
Stage 2: Selecting Demonstrations: The best responses are selected as in-context demonstrations based on criteria like consistency (whether the same answer is repeated), minimal repetition, and diversity of reasoning paths. These selected examples are then used to guide the model’s final prediction.
Majority Vote: The final prediction is chosen by majority vote across multiple generated reasoning paths.

How to Use COSP

COSP can be applied to any reasoning-based task where Zero-Shot performance is needed. It requires:

Access to a Large Language Model
Unlabeled test questions

The system will then follow these steps:

1. Generate responses

Run the model multiple times on each test question using Zero-Shot CoT.

Prompt

[Test question]

Let's think step by step.

For example:

Prompt

Henry had 11 dollars. For his birthday, he got 18 more dollars but spent 10 on a new game. How much money does he have now?

Let's think step by step.

AI Outputs for this example:

1. "11 + 18 = 29, 29 - 10 = 19"
2. "Henry has 27 dollars."
3. "He has 11 + 18 - 10 = 19."
4. "He bought 11 games and added 18. Then subtracted 10 for the game."

2. Select demonstrations

COSP automatically selects the most consistent and diverse reasoning paths to use as in-context examples.

Selected Example:

"11 + 18 = 29, 29 - 10 = 19."

3. Final prediction

COSP uses the selected examples to prompt the model again and choose the best answer based on a majority vote.

Final Answer: "19 dollars."

Results of COSP

COSP significantly improves performance on reasoning tasks when compared to Zero-Shot and even Few-Shot CoT. Here are the results from COSP applied to multiple reasoning benchmarks:

Task	Zero-Shot CoT	Few-Shot CoT	COSP
MultiArith	67.2%	81.0%	85.0%
AddSub	69.1%	72.4%	78.9%
GSM-8K	20.9%	30.3%	30.2%
StrategyQA	57.2%	67.9%	64.7%

Conclusion

COSP offers a powerful solution for improving zero-shot reasoning in LLMs. It removes the need for manual example crafting, instead relying on the model’s own outputs to guide reasoning. By combining consistency, diversity, and repetition analysis, COSP leads to significant performance improvements across multiple reasoning tasks. This makes it a scalable and efficient approach for improving LLM reasoning in real-world applications.

Footnotes

Wan, X., Sun, R., Dai, H., Arik, S. O., & Pfister, T. (2023). Better Zero-Shot Reasoning with Self-Adaptive Prompting. https://arxiv.org/abs/2305.14106 ↩ ↩²

Valeriia Kuka

Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.

On this page

What is COSP?
How COSP Differs from Existing Techniques
Benefits and Applications
How COSP Works
How to Use COSP
Results of COSP
Conclusion

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses

🟦 Consistency-based Self-adaptive Prompting (COSP)

What is COSP?

Why is COSP Needed?

How COSP Differs from Existing Techniques

Zero-Shot CoT vs. COSP

Few-Shot CoT vs. COSP

Benefits and Applications

How COSP Works

How to Use COSP

1. Generate responses

Prompt

Prompt

2. Select demonstrations

3. Final prediction

Results of COSP

Conclusion

Footnotes

Valeriia Kuka