🧠 AdvancedEnsembling🟦 DiVeRSe (Diverse Verifier on Reasoning Step)

🟦 DiVeRSe (Diverse Verifier on Reasoning Step)

🟦 This article is rated medium

Reading Time: 3 minutes

Last updated on October 3, 2024

Overview of DiVeRSe (Diverse Verifier on Reasoning Step)

What is DiVeRSe?

DiVeRSe (Diverse Verifier on Reasoning Steps) is a method designed to enhance the reasoning abilities of Large Language Models (LLMs) by improving the way they handle multi-step problems.

LLMs still struggle with complex tasks like arithmetic word problems. DiVeRSe tackles this by adding three major components:

Diverse Prompts: It generates varied prompts to encourage different reasoning paths for the same question.
Verifier: A model that checks the accuracy of reasoning paths and uses a weighted voting scheme to filter out incorrect answers.
Step-Aware Verification: This verifies each reasoning step independently, identifying where mistakes occur and improving the model's reasoning process step by step.

How DiVeRSe Works

Diverse Prompts: DiVeRSe generates multiple reasoning paths by sampling from different prompts. DiVeRSe randomly selects $M_1$ different prompts for each question, and then sample $M_2$ reasoning paths for each prompt using sampling decoding. This way, you obtain $M = M1 × M2$ diverse reasoning paths for each question.
Voting Verifier: Once the model has generated several reasoning paths, the voting verifier comes into play. It evaluates each reasoning path, scoring how likely it is to be correct. This is done using a pre-trained model which takes into account both the question and the reasoning steps. The verifier guides a voting mechanism, weighting paths based on their probability of being correct rather than simply counting how many paths lead to a specific answer.
Step-Aware Verification: A major innovation of DiVeRSe is its step-aware verifier, which checks the correctness of each individual step in the reasoning chain. Often, some steps may be correct while others are wrong, leading to an incorrect final answer. DiVeRSe identifies these mistakes by labeling each step and comparing it to known correct reasoning patterns. This helps improve the overall reasoning process by pinpointing where the error occurs and correcting it.

How to Use DiVeRSe

DiVeRSe can be applied to a range of reasoning tasks, especially those that require step-by-step logic. Here’s how to use on a math problem.

1. Generate Diverse Reasoning Paths

Sample multiple reasoning paths for a given question by generating different prompts.

Prompt

Q: Janet’s ducks lay 16 eggs per day. She eats 3 for breakfast every morning and uses 4 eggs for baking muffins. She sells the remaining eggs for $2 each. How much money does she make per day?

Generated Reasoning Paths:

[Sample 1] 16 - 3 = 13 eggs left, 13 - 4 = 9 eggs left. She sells 9 eggs for $2 each, so 9 * 2 = $18.
[Sample 2] 16 - 3 = 13 eggs, 13 - 4 = 9 eggs, 9 eggs sold for $2 each, so $18.

2. Score Reasoning Paths

Use the verifier to score each path based on its likelihood of being correct.

- Path 1: 91.2% correct.
- Path 2: 88.5% correct.

3. Step-Aware Verification

Apply step-aware verification to check the correctness of individual reasoning steps.

- Step 1: Correct subtraction (16 - 3 = 13).
- Step 2: Correct subtraction (13 - 4 = 9).
- Step 3: Correct multiplication (9 * 2 = 18).

4. Final Answer

Use weighted voting to arrive at the final answer, selecting the most likely correct answer based on the verified reasoning paths.

Final Answer: $18.

Tip

Acess the open-source code.

Results of DiVeRSe

DiVeRSe was evaluated on several reasoning tasks, including arithmetic reasoning (e.g., GSM8K, MultiArith), commonsense reasoning (e.g., CommonsenseQA), and inductive reasoning (e.g., CLUTRR). The method achieved state-of-the-art results on many of these benchmarks, outperforming previous approaches like self-consistency and greedy decoding.

Task	Previous SOTA	Self-Consistency	DiVeRSe
GSM8K	74.4%	76.7%	82.3%
AsDiv	81.9%	86.2%	88.7%
MultiArith	99.3%	98.6%	99.8%
SVAMP	86.6%	85.8%	87.0%
SingleEq	79.5%	93.7%	94.9%
CLUTRR	67.0%	35.6%	95.9%

Conclusion

DiVeRSe offers a powerful method to enhance the reasoning abilities of large language models by leveraging diverse prompts, verifier-based scoring, and step-aware verification. This approach not only improves overall accuracy but also provides finer control over the reasoning process, allowing for more reliable and interpretable results. As LLMs continue to evolve, DiVeRSe represents a step forward in making these models more capable and trustworthy in complex reasoning tasks.

Valeriia Kuka

Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.

Footnotes

Li, Y., Lin, Z., Zhang, S., Fu, Q., Chen, B., Lou, J.-G., & Chen, W. (2023). Making Large Language Models Better Reasoners with Step-Aware Verifier. https://arxiv.org/abs/2206.02336 ↩ ↩²

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

AI Red-Teaming and AI Security Masterclass

Live AI Security Courses