Last updated on September 27, 2024
Often, Large Language Models (LLMs) generate responses that look plausible but are incorrect. This is referred to as hallucination in LLM and is an unresolved issue. Hallucinations are more apparent for lesser-known facts or facts that appear less frequently in the training corpus and when generating long-form texts (multiple sentences and paragraphs). For reasoning tasks, Chain-of-Thought (CoT) improves the quality of response by encouraging the model to generate its thoughts before producing the final response. Chain-of-Verification (CoVe) follows a similar approach and uses verification questions to improve the quality of response.
Chain-of-Verification (CoVe) prompting is a self-critique technique that improves upon the initial response from the LLM by first planning the verification questions to check its work and then systematically answering those questions to produce a refined response. The answers to these verification questions are usually short and easy compared to the initial answer. The initial answer is improved by providing verification questions and answers to the LLM as feedback.
Here is an example employing CoVe to refine the original answer:
Name some politicians who were born in NY, New York.
Here are some politicians who were born in NY, New York: Hillary Clinton - former secretary of state and former Democratic presidential nominee Donald Trump - former president of the United States Michael Bloomberg - former Mayor of New York City and former Democratic presidential candidate ...
Where was Hillary Clinton born? Where was Donald Trump born? Where was Michael Bloomberg born? ...
Hillary Clinton was born in Chicago, Illinois, United States on October 26, 1947. Donald Trump was born on June 14, 1946, in Queens, New York City, New York, United States Michael Bloomberg was born on February 14, 1942, in Boston, Massachusetts, United States. ...
Here are some politicians who were born in NY, New York: Donald Trump - former president of the United States Alexandria Ocasio-Cortez - Democratic member of the U.S. House of Representatives ...
As seen in the example above, the individual verification questions are answered more accurately than in the original long-form answer.
Answer generation using CoVe is a 4-step process:
Let's go through an example to understand the process better.
Let's say we want to find the names of politicians born in New York City (NYC).
Step 1: Baseline response generation
Prompt the model to get the list of politicians born in NYC.
Step 2: Planning verifications
Generate a list of verification questions to verify the model's answer.
Step 3: Verification execution
Get answers for each of the verification questions.
Step 4: Final response generation
Use the answer from the previous step to refine the original answer.
As expected, the final refined response from the LLM only consists of those individuals who were born in NYC, as revealed by the answers to the verification questions in step 3.
Please note that unlike the examples above, the original paper uses Few-Shot Prompting to execute the entire process.
Experiments using CoVe for list-based and long-form generation show that:
Test Precision and average number of positive and negative (hallucination)
CoVe's performance against Zero-Shot and Few-Shot in closed book MultiSpanQ
CoVe against InstructGPT, ChatGPT and PerplexityAI
Hallucinations are common in LLM responses, especially when the generated response is a long passage comprising multiple sentences. Such hallucinations degrade the quality of the generated response. CoVe is a simple and effective technique to reduce the hallucinations from an LLM response without any training or fine-tuning. However, the paper doesn't study the effectiveness of CoVe in reducing hallucinations other than factual inaccuracies.