Announcing our new Paper: The Prompt Report, with Co-authors from OpenAI & Microsoft!

Check it out →
🧠 AdvancedFew-Shot◆ K-Nearest Neighbor (KNN)

◆ K-Nearest Neighbor (KNN) Prompting

Last updated on August 27, 2024 by Andres Caceres

Takeaways
  • K-Nearest Neighbor (KNN) selects relevant exemplars to improve few-shot prompts by finding the most similar examples to the input query.
  • It enhances model accuracy by using k similar examples in fields like machine learning, chemistry, and law.
  • Limitations: KNN is computationally expensive for large datasets and requires careful k value selection.

What is K-Nearest Neighbor?

K-Nearest neighbor (KNN) is a technique to choose exemplars for a few-shot standard prompt from a dataset of exemplars1. The goal is to choose the exemplars that are most relevant to the task at hand, thereby improving the performance of the model2.

It works by selecting kk examples from an external dataset that are most similar to the prompt that you're giving the model. KK is a value chosen by the user, and it corresponds to the number of exemplars you want to include in the prompt.

How to Use K-Nearest Neighbor

First, you need a dataset of example prompts and completions, like the following:

IDPromptCompletion
1What is the capital of France?The capital of France is Paris.
2How tall is Mount Everest?Mount Everest is 8,848 meters tall.
3Who wrote 'Romeo and Juliet'?'Romeo and Juliet' was written by Shakespeare.
4What is the largest ocean?The Pacific Ocean is the largest ocean.
5When was the Declaration of Independence signed?The Declaration of Independence was signed in 1776.
6Who painted the Mona Lisa?The Mona Lisa was painted by Leonardo da Vinci.
7What is the speed of light?The speed of light is approximately 299,792 kilometers per second.
8How long does it take to travel to the moon?It takes about 3 days to travel to the moon.
9What is the capital of Germany?The capital of Germany is Berlin.
10What is the boiling point of water?The boiling point of water is 100°C or 212°F at sea level.

Then, you input your prompt into the KNN function and find the kk number of prompts from the dataset that are most similar to use as exemplars. You then use those prompts, along with their completions, as exemplars in your few-shot prompt and give it to the model with your original question.

K-Nearest Neighbor Example: Machine Learning

Your prompt is "What is overfitting in machine learning?", your kk value is 2, and your dataset is as follows:

IDPromptCompletion
1Who discovered the law of gravity?The law of gravity was discovered by Sir Isaac Newton.
2What is supervised learning?Supervised learning is a type of machine learning where the model is trained on labeled data.
3What is the boiling point of water in Celsius?The boiling point of water in Celsius is 100°C.
4Who wrote 'Pride and Prejudice'?'Pride and Prejudice' was written by Jane Austen.
5What is a neural network?A neural network is a series of algorithms that attempt to recognize patterns in data, mimicking the human brain.
6What is the capital of Italy?The capital of Italy is Rome.
7What is the speed of sound in air?The speed of sound in air is approximately 343 meters per second.
8What is bail in court?Bail is a temporary release of an accused person awaiting trial.
9What is the tallest mountain in Africa?The tallest mountain in Africa is Mount Kilimanjaro.
10What is the chemical symbol for gold?The chemical symbol for gold is Au.

Once you plug in your prompt into the KNN function, it'll return prompts 2 and 5 as the nearest neighbors. Now, you use those in your few-shot prompt:

Astronaut

Prompt


Q: What is supervised learning? A: Supervised learning is a type of machine learning where the model is trained on labeled data. Q: What is a neural network? A: A neural network is a series of algorithms that attempt to recognize patterns in data, mimicking the human brain. Q: What is overfitting in machine learning? A:

Robot

AI Output


Overfitting in machine learning occurs when a model learns not only the underlying patterns in the training data but also the noise and outliers.

K-Nearest Neighbor Example: Chemistry

Your prompt is "What are London Dispersion Forces?", your k value is 1, and your dataset is as follows:

IDPromptCompletion
1Who was the first president of the United States?The first president of the United States was George Washington.
2What country is bordered on three sides by Senegal?The Gambia is bordered on three sides by Senegal.
3How many continents are there?There are seven continents on Earth.
4Who developed the periodic table?The periodic table was developed by Dmitri Mendeleev.
5What was the Great Leap Forward?The Great Leap Forward was a social and economic campaign initiated by the Chinese Communist Party under the leadership of Mao Zedong from 1958 to 1962
6What is the capital of Australia?The capital of Australia is Canberra.
7What is the process of rusting?Rusting is the oxidation of iron in the presence of water and oxygen, forming iron oxide.
8Who wrote 'The Odyssey'?'The Odyssey' was written by Homer.
9What is the largest planet in our solar system?The largest planet in our solar system is Jupiter.
10What is the freezing point of mercury?The freezing point of mercury is -38.83°C or -37.89°F.

Once you plug in your prompt into the KNN function, it'll return prompt 10 as the nearest neighbor. Now, you use prompt 10 in your few-shot prompt:

Astronaut

Prompt


Q: What is the freezing point of mercury? A: The freezing point of mercury is -38.83°C or -37.89°F. Q: What are London Dispersion Forces? A:

Robot

AI Output


London Dispersion Forces are a type of weak intermolecular force that arises from the temporary fluctuations in electron distribution within atoms and molecules.

K-Nearest Neighbor Example: Law

Your prompt is "What is prior restraint?", your k value is 3, and your dataset is as follows:

IDPromptCompletion
1What is the capital of Brazil?The capital of Brazil is Brasília.
2What is the purpose of the Constitution?The Constitution serves as the supreme law of the land, outlining the structure of government and the rights of citizens.
3Who painted the Mona Lisa?The Mona Lisa was painted by Leonardo da Vinci.
4What is the Miranda warning?The Miranda warning is a statement police must give to inform suspects of their rights to remain silent and have an attorney.
5What is the boiling point of ethanol?The boiling point of ethanol is 78.37°C or 173.07°F.
6Who is the author of '1984'?'1984' was written by George Orwell.
7What is the principle of double jeopardy?Double jeopardy is a legal principle that prohibits someone from being tried twice for the same crime.
8What is the atomic number of hydrogen?The atomic number of hydrogen is 1.
9What is the speed of light in a vacuum?The speed of light in a vacuum is approximately 299,792 kilometers per second.
10Who discovered penicillin?Penicillin was discovered by Alexander Fleming in 1928.

Once you plug in your prompt into the KNN function, it'll return prompts 2, 4, and 7 as the nearest neighbors. Now, you use those in your few-shot prompt:

Astronaut

Prompt


Q: What is the purpose of the Constitution? A: The Constitution serves as the supreme law of the land, outlining the structure of government and the rights of citizens. Q: What is the Miranda warning? A: The Miranda warning is a statement police must give to inform suspects of their rights to remain silent and have an attorney. Q: What is the principle of double jeopardy? A: Double jeopardy is a legal principle that prohibits someone from being tried twice for the same crime. Q: What is prior restraint? A:

Robot

AI Output


Prior restraint is a legal concept where the government restricts or prevents publication or speech before it happens, rather than punishing it after it occurs.

Limitations of K-Nearest Neighbor

Since, for a given prompt, KNN calculates the similarity of all prompts in a dataset, it can be computationally expensive for large datasets. Also, choosing a correct kk value is arbitrary and can be very difficult if you don't know your dataset well.

Conclusion

KNN is part of a family of algorithms used to choose exemplars most similar to the prompt at hand to boost performance in few-shot prompting. Though it is effective at improving performance, it can be computationally expensive and is only useful for tasks that require high specificity and are too complex for regular exemplar finding methods.

Footnotes

  1. Shi, W., Michael, J., Gururangan, S., & Zettlemoyer, L. (2022). kNN-Prompt: Nearest Neighbor Zero-Shot Inference. https://arxiv.org/abs/2205.13792

  2. Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., & Chen, W. (2021). What Makes Good In-Context Examples for GPT-3? https://arxiv.org/abs/2101.06804

Word count: 0
Copyright © 2024 Learn Prompting.