Soft prompts are prompt vectors whose weights have been optimized for specific tasks, making them uninterpretable to humans.
This stands in contrast to model fine-tuning, in which the weights of the model are adjusted.

Prompt Tuning is an alternative to model fine-tuning that adapts a large language model (LLM) for specific tasks by updating only a small set of additional parameters called soft prompts, while keeping the main model weights frozen.

Prompt tuning lets you use the same model for all tasks. You just need to append the proper prompts at inference time, which makes batching across different tasks easier. This is pretty much the same advantage that regular prompting has. Additionally, soft prompts trained for a single model across multiple tasks will often be of the same token length.

Model Tuning vs Prompt Tuning. In model tuning, you finetune the same model on different tasks. This gives you a few different models, with which you can't necessarily batch inputs easily.

What are Soft Prompts?

Unlike traditional text-based prompts, which are discrete strings of text, soft prompts are continuous vectors (embeddings) that are learned during training. These vectors capture task-specific information and act as cues for the frozen model. Because the core model remains unchanged, the soft prompts can be efficiently tuned for different tasks without the overhead of full model fine-tuning.

How Prompt Tuning Works

Begin with a large pre-trained language model (e.g., T5, GPT) that has been trained on a vast dataset.
The main model parameters are kept fixed. A small set of soft prompt embeddings is added to the input. These are trainable vectors that serve as task-specific instructions.
The soft prompt embeddings are appended to the tokenized representation of the input text. This creates a combined input that provides both the original text and the task cues from the soft prompts.
During training on a specific task, backpropagation updates only the soft prompt parameters. The rest of the model remains unchanged, which minimizes computational and storage requirements.
Once trained, the soft prompts can be stored and later appended to inputs at inference time, allowing the same frozen model to be used across multiple tasks by simply switching the soft prompt.

How It Works in Practice

To understand the basic logic behind soft prompting, let's think about how model inference works on a given prompt:

Prompt

What's 2+2?

It might be tokenized as What, 's, 2, +, 2, ?.
Then, each token will be converted to a vector of values.
These vectors of values can be considered as model parameters. The model can be further trained, only adjusting the weights of these prompts.

Main Benefits of Prompt Tuning

Benefit	Description
Efficient parameter usage	Requires 0.01%–0.1% of the parameters compared to fine-tuning.
Scales with model size	Performance improves as model size increases.
Enables multi-task learning	A single model can handle multiple tasks by switching soft prompts.
Better generalization	Improves zero-shot learning and robustness to domain shifts.
Storage & compute savings	No need to store separate fine-tuned models for each task.

Results

Prompt tuning performs better with larger models. Larger models also require fewer soft prompt tokens. Regardless, more than 20 tokens do not yield significant performance gains.

Conclusion

Prompt tuning is a scalable and efficient alternative to full fine-tuning for adapting large language models. By leveraging soft prompts, it enables multi-task learning, reduces storage and inference costs, and achieves competitive performance with fine-tuning—especially as model size increases.

Footnotes

Lester, B., Al-Rfou, R., & Constant, N. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning. ↩ ↩²
Khashabi, D., Lyu, S., Min, S., Qin, L., Richardson, K., Welleck, S., Hajishirzi, H., Khot, T., Sabharwal, A., Singh, S., & Choi, Y. (2021). Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts. ↩

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

On this page

How Prompt Tuning Works
How It Works in Practice
Main Benefits of Prompt Tuning
Results
Conclusion

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses

Prompt Tuning with Soft Prompts