When crafting prompts for language learning models (LLMs), there are several factors to consider. The format and labelspace both play crucial roles in the effectiveness of the prompt.

The Importance of Format

The format of the exemplars in a prompt is crucial. It instructs the LLM on how to structure its response. For instance, if the exemplars use all capital words as answers, the LLM will follow suit, even if the answers provided are incorrect.

Consider the following example:

What is 2+2?
FIFTY
What is 20+5?
FORTY-THREE
What is 12+9?
TWENTY-ONE

Despite the incorrect answers, the LLM correctly formats its response in all capital letters.

Ground Truth: Not as Important as You Might Think

Interestingly, the actual answers or 'ground truth' in the exemplars are not as important as one might think. Research shows that providing random labels in the exemplars (as seem in the above example) has little impact on performance. This means that the LLM can still generate a correct response even if the exemplars contain incorrect information.

The Role of Labelspace

While the ground truth may not be crucial, the labelspace is. The labelspace refers to the list of possible labels for a given task. For example, in a classification task, the labelspace might include "positive" and "negative".

Providing random labels from the labelspace in the exemplars can help the LLM understand the labelspace better, leading to improved results. Furthermore, it's important to represent the distribution of the labelspace accurately in the exemplars. Instead of sampling uniformly from the labelspace, it's better to sample according to the true distribution of the labels. For example, if you have a dataset of restaurant reviews and 60% of them are positive, your prompt should contains a 3:2 ratio of positive/negative prompts.

Additional Tips

When creating prompts, using between 4-8 exemplars tends to yield good result. However, it can often be beneficial to include as many exemplars as possible.

In conclusion, understanding the importance of format, ground truth, and labelspace can greatly enhance the effectiveness of your prompts.

Footnotes

See the vocabulary reference for more info. ↩
Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., & Zettlemoyer, L. (2022). Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? ↩ ↩² ↩³ ↩⁴

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering