Last updated on October 26, 2024
Max Mutual Information Method (MMI) is a way to choose the optimal prompt template for your task by using the mutual information score between the template and the output of the model as a metric, and finding whichever template from your list of templates maximizes that metric.
Mutual information (MI) is a concept from information theory that quantifies how much information two variables share. In this case, it measures how much a given prompt reveals about the model's output. The intuition is that a prompt with high MI is more likely to produce accurate responses, even if we don’t know the "right" answer ahead of time.
First, you get a list of templates for the task:
Second, you input them into the model, generate outputs, and calculate the mutual information scores for each. For example, the following is the calculation of the mutual information scores for the first two templates:
What is the capital of France?
The capital of France is Paris.
Mutual information score: 0.85
The capital of France is:
Paris.
Mutual information score: 0.92
The third and last step is to choose whichever prompt gets the highest mutual information score. Let's say the second template had the highest mutual information score (0.92), since the output of the model was very concise and answered the prompt as efficiently as possible.
Now you input the chosen prompt template into the model with your chosen country:
The capital of Chad is:
N'Djamena.
MMI is a simple and efficient approach to selecting the most effective prompt template for a given task. By using a list of templates and the calculated mutual information score for each, MMI lets you find the template that best aligns the model's responses with your task. This method is also flexible and can be used with very few resources.
Sorensen, T., Robinson, J., Rytting, C., Shaw, A., Rogers, K., Delorey, A., Khalil, M., Fulda, N., & Wingate, D. (2022). An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.18653/v1/2022.acl-long.60 ↩