Last updated on September 27, 2024
Prompt Mining is a technique used to identify the best prompt template for a given between and from a corpus of text. Similar to traditional mining, where you search for valuable resources, in prompt mining you use algorithms to uncover the prompt template giving the most accurate results.
The key point here is that Prompt Mining isn’t about selecting the best template for any general task. Instead, it’s focused on improving how Large Language Models (LLMs) retrieve factual knowledge. Essentially, it boosts accuracy by discovering the language patterns and templates the model has "learned" best during training. The goal is to find prompts that consistently trigger the model to predict correct factual information.
A prompt template is a structured format for presenting questions or statements to the model, often with placeholders for customization. For example:
Q: Why is the sky blue?
A:
In this case, the template would be:
Q: {question}?
A:
Alternatively, another template could guide the model to complete a statement:
The sky is blue because ...
[x] is [y] because ...
As you can see, in both cases, the user intent is the same. However, Prompt Mining seeks the template the model is most 'acquainted' with based on its training data. Being 'acquainted' means the template frequently appears in the corpus, reflecting language patterns familiar to the model, even if not in exactly the same form.
Prompt Mining is a two-stage process:
Let’s break these stages down in more detail.
In this stage, you need a large corpus of text that is representative of the data the model was trained on. For instance, if your model was trained on Wikipedia and research papers, you should use those as your corpus.
Here are two common methods for generating prompt templates:
This method extracts prompts from the corpus by identifying relationships between subject-object pairs. The words between the subject and object typically represent the relation, which can be converted into a prompt template. For example:
[x] was born in [y]
Another approach is to use syntactic analysis to identify relationships between (subject, object) pairs. This method is flexible because it doesn’t require a manually created reference prompt, making it applicable to various types of relations.
This method starts with an original prompt and aims to improve lexical diversity by paraphrasing the seed prompt while maintaining the same meaning. For example:
[x] shares a border with [y]
Once you’ve generated a set of prompt templates, you can use a mining algorithm to find the optimal one. The simplest approach is to select the template that appears most frequently in the corpus.
A more advanced approach is to select the template that produces the most accurate results, based on ground truth data. Ground truth refers to the actual correct labels or facts in a dataset that are used to evaluate the model's predictions. For example, if you're working with relationship " is owned by ", and the relation you're testing is "YouTube is owned by Alphabet," the ground-truth object in this case would be "Alphabet." When evaluating the model's performance, the accuracy is determined by how often the model's predicted object (e.g., the model might predict "Alphabet" or another company) matches this ground-truth object.
Let’s say you’re working with the relation "manufacturer." To illustrate the difference, let's suppose you have a manual prompt, a prompt that you've created based on your intuition:
[y] manufactured the [x]
This serves as your baseline. Now, you turn to Stage 1 and generate prompt templates from a Wikipedia corpus. You might get results like these:
This is a list of potential templates. Next, you use a metric to select the best template, for example, by choosing the one that appears most frequently. Here are the results:
Template | Frequency |
---|---|
[y] introduced the [x] | 0.5940 |
[y] released the [x] | 0.0022 |
[x] attributed to the [y] | 0.1109 |
[y] sprinter [x] | 0.00005 |
[y] announced the [x] | 0.2857 |
[y] launched the [x] | 0.0040 |
[y] introduces the [x] | 0.00057 |
As you can see, the most frequent prompt template is "[y] introduced the [x]." Now you can compare the performance of the manual prompt versus the one identified through Prompt Mining.
[y] manufactured the [x]
[y] introduced the [x]
While Prompt Mining can enhance accuracy, it comes with a few limitations:
Computational Cost: Mining through large text corpora is resource-intensive and can be computationally expensive. The potential performance gains might not always justify the computing power required.
Minimal Performance Gains: Sometimes, the improvement in performance is minimal, and in certain cases, using a mined prompt template could even result in worse outcomes if it doesn’t align with the specific nuances of the task.
Prompt Mining is a powerful technique that helps improve the accuracy of large language models by identifying the most effective prompt templates based on their training data. By finding patterns the model is familiar with, it enhances the likelihood of retrieving factual information more reliably.
Jiang, Z., Xu, F. F., Araki, J., & Neubig, G. (2019). How Can We Know What Language Models Know? https://arxiv.org/abs/1911.12543 ↩
Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A. H., & Riedel, S. (2019). Language Models as Knowledge Bases? https://arxiv.org/abs/1909.01066 ↩