Announcing our new Paper: The Prompt Report, with Co-authors from OpenAI & Microsoft!
Check it out →Prompt debiasing involves applying specific methods to ensure that large language model responses are not skewed toward certain biases. By applying specific strategies, it's possible to counteract these inherent biases which can come from training data or prompt design. These strategies include updating our few-shot exemplars and explicitly instructing the model to refrain from biased responses. This page covers a few of these simple techniques to debias your prompts, ensuring fair and balanced outputs from LLMs.
Depending on their distribution and order within the prompt, exemplars may bias LLM outputs1. This is discussed to some extent in the What's in a Prompt page. In this page, we'll dive into ways that bias may occur as a result of the distribution and ordering of few-shot exemplars in prompts. By making adjustments to neutralize such input data imbalances, you can improve the reliability of the model's response.
When discussing the distribution of exemplars within a prompt, we are referring to how many exemplars from different classes are present. For example, if you are performing binary sentiment analysis (positive or negative) on tweets, and you provide 3 positive tweets and 1 negative tweet as exemplars, then you have a distribution of 3:1. Since the distribution is skewed towards positive tweets, the model will be biased towards predicting positive tweets.
The following is an example of a biased distribution.
Q: Tweet: "What a beautiful day!"
A: positive
Q: Tweet: "I love pockets on jeans"
A: positive
Q: Tweet: "I love hotpockets"
A: positive
Q: Tweet: "I hate this class"
A: negative
Having an even exemplar distribution is better.
Q: Tweet: "What a beautiful day!"
A: positive
Q: Tweet: "I love pockets on jeans"
A: positive
Q: Tweet: "I don't like pizza"
A: negative
Q: Tweet: "I hate this class"
A: negative
The order of exemplars can also significantly influence the success of prompt debiasing. For example, a prompt that has randomly ordered exemplars will often perform better than the above prompt, which contains positive tweets first, followed by negative tweets.
Q: Tweet: "I hate this class"
A: negative
Q: Tweet: "What a beautiful day!"
A: positive
Q: Tweet: "I don't like pizza"
A: negative
Q: Tweet: "I love pockets on jeans"
A: positive
We can explicitly prompt GPT-3 to be unbiased, by including an instruction to do so in the prompt. In particular, Si et al.1 use the following instruction when experimenting with BBQ2, a bias benchmark.
We should treat people from different socioeconomic statuses, sexual orientations, religions, races, physical appearances, nationalities, gender identities, disabilities, and ages equally. When we do not have sufficient information, we should choose the unknown option, rather than making assumptions based on our stereotypes.
Prompt debiasing helps ensure that AI-generated content does not perpetuate biases found in few-shot examples or training data. By carefully considering exemplar distribution and order, as well as incorporating explicit instructions for unbiased outputs, we can guide language models toward better responses.
Prompt debiasing is crucial to ensuring that the responses of our large language models do not represent any existing biases from the input exemplars or the training data.
If the few-shot exemplars provided in your prompt lean more heavily toward a certain class or if the ordering of exemplars is distributed unevenly, the LLM output could be skewed toward this biased input distribution.
Three ways to debias your prompts are (1) having a balanced number of exemplars from each class, (2) randomizing exemplar order to evenly distribute exemplars from different classes, and (3) explicitly instructing the model to be unbiased.
See more on debiasing in the Calibration section.