Universal Self-Consistency is a prompting technique used to refine and improve the accuracy of answers generated by a Large Language Model (LLM). It compiles multiple responses the model has previously given and then prompts the model to choose the best answer from among them.

USC builds on the concept of self-consistency, which uses multiple reasoning paths to find the most common response as a way to improve prediction confidence. Unlike standard self-consistency, which requires exact answers (like numbers) to tally votes, USC extends this approach to free-form responses by having the LLM select the most internally consistent answer from multiple generated outputs.

Benefits and Applications

Intuitive: Universal Self-Consistency is a very intuitive and easy-to-grasp approach for generating accurate responses.
Enhanced Reasoning: Since universal Self-Consistency uses Chain of Thought (CoT) Prompting to generate the answers that are then put into the final prompt, the method ensures diversity — and thus effectiveness — in reasoning.
Good for Free-form Text Generation: Universal Self-Consistency is particularly useful for free-form text generation, where the model can choose which text is best.

How USC Differs from Existing Techniques

USC enhances traditional self-consistency by supporting free-form answers, which is essential for tasks like summarization, open-ended Q&A, and code generation. Where previous methods required the extraction of identical answers, USC leverages LLMs to find internal consistency, making it more adaptable and reliable for diverse tasks.

Method	USC	Standard Self-Consistency	Execution-Based Self-Consistency
Output Requirement	Free-form or structured	Structured answers (e.g., single values)	Structured answers with execution results
Selection Approach	Consistency-based LLM selection	Answer extraction with majority vote	Code execution to find matching outputs
Applications	Open-ended Q&A, summarization, code generation	Math, logic, closed-form Q&A	Code generation

How to Use Universal Self-Consistency Works

Step 1: Generate Multiple Responses with CoT: Begin by prompting the LLM several times on the same question. Record each unique answer.

Prompt Template

[Prompted question or task]

Step 2: Select Consistent Answer: Compile all responses into a new prompt, asking the LLM to select the most accurate or reasonable answer.

Universal Self-Consistency Prompt Template

I have generated the following responses to the question: [Prompted question or task]

Response 1: [Response 1] Response 2: [Response 2] Response 3: [Response 3] ...

Evaluate these responses. Select the most consistent response based on majority consensus. Start your answer with "The most consistent response is Response X" (without quotes).

USC is particularly useful when you need accurate answers from an LLM but don’t have predefined answer structures or a way to validate answers externally.

1. For Mathematical Reasoning Tasks

Example Task: Solving math problems.
Method: USC generates multiple solutions and uses internal consistency rather than exact answer matching to select the best response.
Benefits: Maintains accuracy comparable to traditional self-consistency without requiring specific answer formatting.

2. For Code Generation

Example Task: Generating SQL queries or Python code.
Method: USC selects the code response that has the most consistent logic among generated samples.
Benefits: Matches execution-based voting accuracy without requiring actual code execution, saving computational resources.

3. For Summarization and Open-Ended Generation

Example Task: Summarizing long documents or answering open-ended questions.
Method: USC evaluates multiple summaries and selects the most consistent one, without needing predefined answer structures.
Benefits: Improves quality by finding answers that align across diverse outputs, particularly useful for long and complex texts.

Example Use: Astronomy

Suppose we want to know the largest planet in our solar system. We start by generating multiple responses:

Prompt

Which planet in our solar system is the largest?

AI Output

Our solar system consists of eight planets. The planets in order of size are: Jupiter, Saturn, Uranus, Neptune, Earth, Venus, Mars, and Mercury.

Among these, Jupiter is the largest. So, the largest planet in our solar system is Jupiter.

Let's do this 5 times. These are our outputs:

Our solar system consists of eight planets. The planets in order of size are: Jupiter, Saturn, Uranus, Neptune, Earth, Venus, Mars, and Mercury. Among these, Jupiter is the largest. So, the largest planet in our solar system is Jupiter.
Our solar system consists of eight planets. The planets in order from the sun are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. If we compare the sizes of these planets, we find that Jupiter is the largest. Jupiter's diameter is about 142,984 kilometers, making it over twice as large as any other planet in our solar system. So, the largest planet in our solar system is Jupiter.
Our solar system has eight planets. The planets in order from the sun are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. When comparing the sizes of these planets, Jupiter stands out. Jupiter's diameter is about 11 times that of Earth, making it the largest planet in our solar system. So, the answer is Jupiter.
Our solar system has eight planets. The planets in order from the sun are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. If we compare the sizes of these planets, we find that Jupiter is significantly larger than the others. In fact, Jupiter is so large that over 1,300 Earths could fit inside it. Therefore, the largest planet in our solar system is Jupiter.
Our solar system has eight planets. The planets in order from the sun are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. If we compare the sizes of these planets, we find that: Mercury is the smallest. Venus and Earth are roughly the same size. Mars is about half the size of Earth. Jupiter is much larger than any of the inner planets. Saturn is the second largest planet. Uranus and Neptune are larger than the inner planets but smaller than Jupiter and Saturn. Therefore, the largest planet in our solar system is Jupiter.

Moving on to step 2, we now put it all into one prompt, asking the LLM to choose the best.

Prompt

I have generated the following responses to the question: Which planet in our solar system is the largest?

[list of the responses]

Evaluate these responses. Select the most consistent response based on majority consensus. Start your answer with "The most consistent response is Response X" (without quotes).

AI Output

The most consistent response is Response 4.

Just to remind you, responce 4 was:

Our solar system has eight planets. The planets in order from the sun are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. If we compare the sizes of these planets, we find that Jupiter is significantly larger than the others. In fact, Jupiter is so large that over 1,300 Earths could fit inside it. Therefore, the largest planet in our solar system is Jupiter.

Results of USC

USC was tested across multiple benchmarks, consistently improving LLM performance compared to traditional methods. Below are results from common benchmark tasks showing USC’s effectiveness.

Task	Greedy Decoding	Random Selection	Standard Self-Consistency	USC
Math (GSM8K)	85.7%	82.9%	90.4%	90.2%
Code Generation (ARCADE)	26.0%	26.8%	30.3%	30.1%
Summarization (GovReport)	ROUGE-1: 38.8	ROUGE-1: 38.5	Not Applicable	ROUGE-1: 40.2
TruthfulQA (Open Q&A)	62.1%	62.9%	Not Applicable	67.7% (truthfulness)

These results highlight USC’s capacity to significantly improve LLM-generated outputs on open-ended tasks where answer extraction for voting is difficult or not feasible.

Conclusion

Universal Self-Consistency is a powerful, intuitive method used to maximize the accuracy and reliability of LLM responses to a given prompt by compiling multiple responses and letting the model itself decide which is the best one. While it can be time-consuming, it doesn't take very much resources and can be highly rewarding, particularly for prompts that involve free-form writing, like for an essay.

Footnotes

Chen, X., Aksitov, R., Alon, U., Ren, J., Xiao, K., Yin, P., Prakash, S., Sutton, C., Wang, X., & Zhou, D. (2023). Universal Self-Consistency for Large Language Model Generation. https://arxiv.org/abs/2311.17311 ↩

Andres Caceres

Andres Caceres, a documentation writer at Learn Prompting, has a passion for AI, math, and education. Outside of work, he enjoys playing soccer and tennis, spending time with his three huskies, and tutoring. His enthusiasm for learning and sharing knowledge drives his dedication to making complex concepts more accessible through clear and concise documentation.

On this page

What is Universal Self-Consistency?
Benefits and Applications
How USC Differs from Existing Techniques
How to Use Universal Self-Consistency Works
Example Use: Astronomy
Results of USC
Conclusion

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses

🟢 Universal Self-Consistency

What is Universal Self-Consistency?

Benefits and Applications

How USC Differs from Existing Techniques

How to Use Universal Self-Consistency Works

Prompt Template

Universal Self-Consistency Prompt Template

1. For Mathematical Reasoning Tasks

2. For Code Generation

3. For Summarization and Open-Ended Generation

Example Use: Astronomy

Prompt

AI Output

Prompt

AI Output

Results of USC

Conclusion

Footnotes

Andres Caceres