Generated Knowledge

🟦 This article is rated medium

Reading Time: 3 minutes

Last updated on August 7, 2024

The idea behind the generated knowledge approach is to ask the LLM to generate potentially useful information about a given question/prompt before generating a final response.

For example, say you want to write a small blog post about some animal, like Spirit bears. Before asking the LLM to write the blog post, you can ask it to generate some facts about Spirit bears. This will help the LLM write a more informative blog post. There are two simple ways of going about this.

Single prompt approach

The first approach asks the LLM to generate knowledge then write the post, all with a single prompt.

Generate 4 facts about the Kermode bear, then use these facts to write a short blog post using the information:

The Kermode bear, also known as the spirit bear, is a rare subspecies of the American black bear that is found in the coastal regions of British Columbia, Canada.

1. The Kermode bear is a unique subspecies of the American black bear, with approximately 10% of the population having white fur.

2. The Kermode bear is a symbol of the First Nations people of British Columbia, and is seen as a spirit animal.

3. The Kermode bear is an important part of the ecosystem, helping to disperse seeds and maintain the balance of the forest.

4. The Kermode bear is a protected species, and its population is monitored closely by the provincial government.

The Kermode bear is a unique and special animal that is found in the coastal regions of British Columbia, Canada. Approximately 10% of the population of this subspecies of the American black bear have white fur, making them a rare sight. The Kermode bear is a symbol of the First Nations people of British Columbia, and is seen as a spirit animal. It is an important part of the ecosystem, helping to disperse seeds and maintain the balance of the forest. The Kermode bear is a protected species, and its population is monitored closely by the provincial government. It is a beautiful and majestic animal, and it is important that we do our part to protect and conserve this species.

Dual prompt approach

In this approach, we first prompt the LLM to generate facts about the bear:

Generate 10 facts about the Kermode bear:

1. The Kermode bear, also known as the Spirit Bear, is a rare subspecies of the American black bear found in British Columbia, Canada.

2. The Kermode bear has a unique white or cream-colored coat, which is caused by a recessive gene.

3. The Kermode bear is a symbol of hope and renewal for the First Nations people of British Columbia.

4. The Kermode bear is a protected species and is listed as a species of special concern by the Committee on the Status of Endangered Wildlife in Canada.

Then, we feed that information into another prompt to write the blog post:

1. The Kermode bear, also known as the Spirit Bear, is a rare subspecies of the American black bear found in British Columbia, Canada.

2. The Kermode bear has a unique white or cream-colored coat, which is caused by a recessive gene.

3. The Kermode bear is a symbol of hope and renewal for the First Nations people of British Columbia.

4. The Kermode bear is a protected species and is listed as a species of special concern by the Committee on the Status of Endangered Wildlife in Canada.

Use the above facts to write a one paragraph blog post about the Kermode bear:



The Kermode bear, also known as the Spirit Bear, is a rare subspecies of the American black bear found in British Columbia, Canada. This unique bear has a white or cream-colored coat, caused by a recessive gene, and is a symbol of hope and renewal for the First Nations people of British Columbia. The Kermode bear is a protected species and is listed as a species of special concern by the Committee on the Status of Endangered Wildlife in Canada, making it an important part of the region's biodiversity.

I recommend this second approach since it can produce longer content more reliably.

Another use case

The generated knowledge approach was actually introduced for a completely different task, that of answering difficult questions. Consider the following question, which GPT-3 answers incorrectly:

Note

This example may not be accurate. We are working to revise it.

If we first ask LLM to generate facts about Congo and South Africa, we can then use that information to answer the question correctly. In theory, this is similar to CoT prompting, since we are effectively getting the LLM to generate intermediate reasoning in the form of related facts.

Let's start with the first step, knowledge generation. We can ask the LLM to generate facts about Congo and South Africa:

Next, let's use that knowledge to answer the question correctly. This is the knowledge integration step!

A more technical discussion

Although the above use case was similar to the way generated knowledge was originally introduced, it is not exactly the same. The below content covers the more technical context in which the approach was introduced. It follows the two intermediate steps (knowledge generation and knowledge integration) pattern that we saw above.

Generated Knowledge (Liu et al.)

Knowledge Generation

In the knowledge generation step, the LLM is asked to generate a set of facts about the question. The LLM is prompted in few-shot fashion as seen below. M different completions are generated using this same prompt (similar to the self-consistency approach).

Generated Knowledge Example (Liu et al.)

Knowledge Integration

Next, we generate "knowledge augmented" questions and prompt the LLM with them to get final answers. The easiest way to understand this is to go through an example.

Let's assume we are attempting to answer the question "Most Kangaroos have <mask> limbs". Assume that at the knowledge generation step we generated 2 knowledges (M=2):

Knowledge 1: Kangaroos are marsupials that live in Australia.
Knowledge 2: Kangaroos are marsupials that have 5 limbs.

Now, we concatenate each knowledge with the question to generate knowledge augmented questions:

Knowledge Augmented Question 1: Most Kangaroos have <mask> limbs. Kangaroos are marsupials that live in Australia.
Knowledge Augmented Question 2: Most Kangaroos have <mask> limbs. Kangaroos are marsupials that have 5 limbs.

We then prompt the LLM with these knowledge augmented questions and get the final answer proposals:

Answer 1: 4
Answer 2: 5

We select the answer with the highest probability as the final answer. The highest probability could be the softmax probability of the answer token, or the log probability of the answer token(s).

Recitation-Augmented Language Models

The recitation-augmented approach it is similar to generated knowledge (basically the same). However, is much less complex than the formal implementation of generated knowledge.

The idea here is to few shot prompt the LLM to generate information and answer in the same step. The fact that it is reciting/generating knowledge and answering the question in the same step is the main difference from the generated knowledge approach.

To reiterate, this approach prompts the model with multiple (question, recitation, answer) exemplars, then asks the question. The authors note that this approach can be combined with self consistency or multiple completion paths.

Notes

Generated knowledge shows improvements on various commonsense datasets.
The knowledge corresponding to the selected answer is called the selected knowledge.
In practice, you could take the most frequently occurring answer as the final one.

Sander Schulhoff

Sander Schulhoff is the CEO of HackAPrompt and Learn Prompting. He created the first Prompt Engineering guide on the internet, two months before ChatGPT was released, which has taught 3 million people how to prompt ChatGPT. He also partnered with OpenAI to run the first AI Red Teaming competition, HackAPrompt, which was 2x larger than the White House's subsequent AI Red Teaming competition. Today, HackAPrompt partners with the Frontier AI labs to produce research that makes their models more secure. Sander's background is in Natural Language Processing and deep reinforcement learning. He recently led the team behind The Prompt Report, the most comprehensive study of prompt engineering ever done. This 76-page survey, co-authored with OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions, analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

Liu, J., Liu, A., Lu, X., Welleck, S., West, P., Bras, R. L., Choi, Y., & Hajishirzi, H. (2021). Generated Knowledge Prompting for Commonsense Reasoning. ↩
Sun, Z., Wang, X., Tay, Y., Yang, Y., & Zhou, D. (2022). Recitation-Augmented Language Models. ↩

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

AI Red-Teaming and AI Security Masterclass

Live AI Security Courses