Announcing our new Course: AI Red-Teaming and AI Safety Masterclass

Check it out →
🧠 Advanced

🟦 DiVeRSe (Diverse Verifier on Reasoning Step)

Last updated on October 3, 2024 by Valeriia Kuka
Overview of DiVeRSe (Diverse Verifier on Reasoning Step)1

What is DiVeRSe?

DiVeRSe (Diverse Verifier on Reasoning Steps)1 is a method designed to enhance the reasoning abilities of large language models (LLMs) by improving the way they handle multi-step problems.

LLMs still struggle with complex tasks like arithmetic word problems. DiVeRSe tackles this by adding three major components:

  1. Diverse Prompts: It generates varied prompts to encourage different reasoning paths for the same question.
  2. Verifier: A model that checks the accuracy of reasoning paths and uses a weighted voting scheme to filter out incorrect answers.
  3. Step-Aware Verification: This verifies each reasoning step independently, identifying where mistakes occur and improving the model's reasoning process step by step.

How DiVeRSe Works

  1. Diverse Prompts: DiVeRSe generates multiple reasoning paths by sampling from different prompts. DiVeRSe randomly selects M1M_1 different prompts for each question, and then sample M2M_2 reasoning paths for each prompt using sampling decoding. This way, you obtain M=M1×M2M = M1 × M2 diverse reasoning paths for each question.

  2. Voting Verifier: Once the model has generated several reasoning paths, the voting verifier comes into play. It evaluates each reasoning path, scoring how likely it is to be correct. This is done using a pre-trained model which takes into account both the question and the reasoning steps. The verifier guides a voting mechanism, weighting paths based on their probability of being correct rather than simply counting how many paths lead to a specific answer.

  3. Step-Aware Verification: A major innovation of DiVeRSe is its step-aware verifier, which checks the correctness of each individual step in the reasoning chain. Often, some steps may be correct while others are wrong, leading to an incorrect final answer. DiVeRSe identifies these mistakes by labeling each step and comparing it to known correct reasoning patterns. This helps improve the overall reasoning process by pinpointing where the error occurs and correcting it.

How to Use DiVeRSe

DiVeRSe can be applied to a range of reasoning tasks, especially those that require step-by-step logic. Here’s how to use on a math problem.

1. Generate Diverse Reasoning Paths

Sample multiple reasoning paths for a given question by generating different prompts.

Astronaut

Prompt


Q: Janet’s ducks lay 16 eggs per day. She eats 3 for breakfast every morning and uses 4 eggs for baking muffins. She sells the remaining eggs for $2 each. How much money does she make per day?

A:

Generated Reasoning Paths:

[Sample 1] 16 - 3 = 13 eggs left, 13 - 4 = 9 eggs left. She sells 9 eggs for $2 each, so 9 * 2 = $18.
[Sample 2] 16 - 3 = 13 eggs, 13 - 4 = 9 eggs, 9 eggs sold for $2 each, so $18.

2. Score Reasoning Paths

Use the verifier to score each path based on its likelihood of being correct.

- Path 1: 91.2% correct.
- Path 2: 88.5% correct.

3. Step-Aware Verification

Apply step-aware verification to check the correctness of individual reasoning steps.

- Step 1: Correct subtraction (16 - 3 = 13).
- Step 2: Correct subtraction (13 - 4 = 9).
- Step 3: Correct multiplication (9 * 2 = 18).

4. Final Answer

Use weighted voting to arrive at the final answer, selecting the most likely correct answer based on the verified reasoning paths.

Final Answer: $18.
Tip

You can find the open-source code here.

Results of DiVeRSe

DiVeRSe was evaluated on several reasoning tasks, including arithmetic reasoning (e.g., GSM8K, MultiArith), commonsense reasoning (e.g., CommonsenseQA), and inductive reasoning (e.g., CLUTRR). The method achieved state-of-the-art results on many of these benchmarks, outperforming previous approaches like self-consistency and greedy decoding.

TaskPrevious SOTASelf-ConsistencyDiVeRSe
GSM8K74.4%76.7%82.3%
AsDiv81.9%86.2%88.7%
MultiArith99.3%98.6%99.8%
SVAMP86.6%85.8%87.0%
SingleEq79.5%93.7%94.9%
CLUTRR67.0%35.6%95.9%

Conclusion

DiVeRSe offers a powerful method to enhance the reasoning abilities of large language models by leveraging diverse prompts, verifier-based scoring, and step-aware verification. This approach not only improves overall accuracy but also provides finer control over the reasoning process, allowing for more reliable and interpretable results. As LLMs continue to evolve, DiVeRSe represents a step forward in making these models more capable and trustworthy in complex reasoning tasks.

Footnotes

  1. Li, Y., Lin, Z., Zhang, S., Fu, Q., Chen, B., Lou, J.-G., & Chen, W. (2023). Making Large Language Models Better Reasoners with Step-Aware Verifier. https://arxiv.org/abs/2206.02336 2

Edit this page
Word count: 0
Copyright © 2024 Learn Prompting.