💪 Prompt Tuning🟦 Prompt Tuning with Soft Prompts

Prompt Tuning with Soft Prompts

🟦 This article is rated medium

Reading Time: 3 minutes

Last updated on March 3, 2025

Takeaways

Soft prompts are prompt vectors whose weights have been optimized for specific tasks, making them uninterpretable to humans.
This stands in contrast to model fine-tuning, in which the weights of the model are adjusted.

Prompt Tuning is an alternative to model fine-tuning that adapts a large language model (LLM) for specific tasks by updating only a small set of additional parameters called soft prompts, while keeping the main model weights frozen.

Prompt tuning lets you use the same model for all tasks. You just need to append the proper prompts at inference time, which makes batching across different tasks easier. This is pretty much the same advantage that regular prompting has. Additionally, soft prompts trained for a single model across multiple tasks will often be of the same token length.

Model Tuning vs Prompt Tuning. In model tuning, you finetune the same model on different tasks. This gives you a few different models, with which you can't necessarily batch inputs easily.

What are Soft Prompts?

Unlike traditional text-based prompts, which are discrete strings of text, soft prompts are continuous vectors (embeddings) that are learned during training. These vectors capture task-specific information and act as cues for the frozen model. Because the core model remains unchanged, the soft prompts can be efficiently tuned for different tasks without the overhead of full model fine-tuning.

How Prompt Tuning Works

Begin with a large pre-trained language model (e.g., T5, GPT) that has been trained on a vast dataset.
The main model parameters are kept fixed. A small set of soft prompt embeddings is added to the input. These are trainable vectors that serve as task-specific instructions.
The soft prompt embeddings are appended to the tokenized representation of the input text. This creates a combined input that provides both the original text and the task cues from the soft prompts.
During training on a specific task, backpropagation updates only the soft prompt parameters. The rest of the model remains unchanged, which minimizes computational and storage requirements.
Once trained, the soft prompts can be stored and later appended to inputs at inference time, allowing the same frozen model to be used across multiple tasks by simply switching the soft prompt.

How It Works in Practice

To understand the basic logic behind soft prompting, let's think about how model inference works on a given prompt:

Prompt

What's 2+2?

It might be tokenized as What, 's, 2, +, 2, ?.
Then, each token will be converted to a vector of values.
These vectors of values can be considered as model parameters. The model can be further trained, only adjusting the weights of these prompts.

Main Benefits of Prompt Tuning

Benefit	Description
Efficient parameter usage	Requires 0.01%–0.1% of the parameters compared to fine-tuning.
Scales with model size	Performance improves as model size increases.
Enables multi-task learning	A single model can handle multiple tasks by switching soft prompts.
Better generalization	Improves zero-shot learning and robustness to domain shifts.
Storage & compute savings	No need to store separate fine-tuned models for each task.

Results

Prompt tuning performs better with larger models. Larger models also require fewer soft prompt tokens. Regardless, more than 20 tokens do not yield significant performance gains.

Conclusion

Prompt tuning is a scalable and efficient alternative to full fine-tuning for adapting large language models. By leveraging soft prompts, it enables multi-task learning, reduces storage and inference costs, and achieves competitive performance with fine-tuning—especially as model size increases.

Sander Schulhoff

Sander Schulhoff is the CEO of HackAPrompt and Learn Prompting. He created the first Prompt Engineering guide on the internet, two months before ChatGPT was released, which has taught 3 million people how to prompt ChatGPT. He also partnered with OpenAI to run the first AI Red Teaming competition, HackAPrompt, which was 2x larger than the White House's subsequent AI Red Teaming competition. Today, HackAPrompt partners with the Frontier AI labs to produce research that makes their models more secure. Sander's background is in Natural Language Processing and deep reinforcement learning. He recently led the team behind The Prompt Report, the most comprehensive study of prompt engineering ever done. This 76-page survey, co-authored with OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions, analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

Lester, B., Al-Rfou, R., & Constant, N. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning. ↩ ↩²
Khashabi, D., Lyu, S., Min, S., Qin, L., Richardson, K., Welleck, S., Hajishirzi, H., Khot, T., Sabharwal, A., Singh, S., & Choi, Y. (2021). Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts. ↩

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

AI Red-Teaming and AI Security Masterclass

Live AI Security Courses