πŸ˜ƒ Basics
🧠 Advanced
Zero-Shot
🟒 Introduction
🟒 Emotion Prompting
🟒 Role Prompting
🟒 Re-reading (RE2)
🟒 Rephrase and Respond (RaR)
🟦 SimToM
β—† System 2 Attention (S2A)
Few-Shot
🟒 Introduction
🟒 Self-Ask
🟒 Self Generated In-Context Learning (SG-ICL)
🟒 Chain-of-Dictionary (CoD)
🟒 Cue-CoT
🟦 Chain of Knowledge (CoK)
β—† K-Nearest Neighbor (KNN)
β—†β—† Vote-K
β—†β—† Prompt Mining
Thought Generation
🟒 Introduction
🟒 Chain of Draft (CoD)
🟦 Contrastive Chain-of-Thought
🟦 Automatic Chain of Thought (Auto-CoT)
🟦 Tabular Chain-of-Thought (Tab-CoT)
🟦 Memory-of-Thought (MoT)
🟦 Active Prompting
🟦 Analogical Prompting
🟦 Complexity-Based Prompting
🟦 Step-Back Prompting
🟦 Thread of Thought (ThoT)
Ensembling
🟒 Introduction
🟒 Universal Self-Consistency
🟦 Mixture of Reasoning Experts (MoRE)
🟦 Max Mutual Information (MMI) Method
🟦 Prompt Paraphrasing
🟦 DiVeRSe (Diverse Verifier on Reasoning Step)
🟦 Universal Self-Adaptive Prompting (USP)
🟦 Consistency-based Self-adaptive Prompting (COSP)
🟦 Multi-Chain Reasoning (MCR)
Self-Criticism
🟒 Introduction
🟒 Self-Calibration
🟒 Chain of Density (CoD)
🟒 Chain-of-Verification (CoVe)
🟦 Self-Refine
🟦 Cumulative Reasoning
🟦 Reversing Chain-of-Thought (RCoT)
β—† Self-Verification
Decomposition
🟒 Introduction
🟒 Chain-of-Logic
🟦 Decomposed Prompting
🟦 Plan-and-Solve Prompting
🟦 Program of Thoughts
🟦 Tree of Thoughts
🟦 Chain of Code (CoC)
🟦 Duty-Distinct Chain-of-Thought (DDCoT)
β—† Faithful Chain-of-Thought
β—† Recursion of Thought
β—† Skeleton-of-Thought
πŸ”“ Prompt Hacking
🟒 Defensive Measures
🟒 Introduction
🟒 Filtering
🟒 Instruction Defense
🟒 Post-Prompting
🟒 Random Sequence Enclosure
🟒 Sandwich Defense
🟒 XML Tagging
🟒 Separate LLM Evaluation
🟒 Other Approaches
🟒 Offensive Measures
🟒 Introduction
🟒 Simple Instruction Attack
🟒 Context Ignoring Attack
🟒 Compound Instruction Attack
🟒 Special Case Attack
🟒 Few-Shot Attack
🟒 Refusal Suppression
🟒 Context Switching Attack
🟒 Obfuscation/Token Smuggling
🟒 Task Deflection Attack
🟒 Payload Splitting
🟒 Defined Dictionary Attack
🟒 Indirect Injection
🟒 Recursive Injection
🟒 Code Injection
🟒 Virtualization
🟒 Pretending
🟒 Alignment Hacking
🟒 Authorized User
🟒 DAN (Do Anything Now)
🟒 Bad Chain
πŸ”¨ Tooling
Prompt Engineering IDEs
🟒 Introduction
GPT-3 Playground
Dust
Soaked
Everyprompt
Prompt IDE
PromptTools
PromptSource
PromptChainer
Prompts.ai
Snorkel 🚧
Human Loop
Spellbook 🚧
Kolla Prompt 🚧
Lang Chain
OpenPrompt
OpenAI DALLE IDE
Dream Studio
Patience
Promptmetheus
PromptSandbox.io
The Forge AI
AnySolve
Conclusion

Prompt Tuning with Soft Prompts

🟦 This article is rated medium
Reading Time: 3 minutes
Last updated on March 3, 2025

Sander Schulhoff

Takeaways
  • Soft prompts are prompt vectors whose weights have been optimized for specific tasks, making them uninterpretable to humans.
  • This stands in contrast to model fine-tuning, in which the weights of the model are adjusted.

Prompt Tuning is an alternative to model fine-tuning that adapts a large language model (LLM) for specific tasks by updating only a small set of additional parameters called soft prompts, while keeping the main model weights frozen.

Prompt tuning lets you use the same model for all tasks. You just need to append the proper prompts at inference time, which makes batching across different tasks easier. This is pretty much the same advantage that regular prompting has. Additionally, soft prompts trained for a single model across multiple tasks will often be of the same token length.

Model Tuning vs Prompt Tuning. In model tuning, you finetune the same model on different tasks. This gives you a few different models, with which you can't necessarily batch inputs easily.

What are Soft Prompts?

Unlike traditional text-based prompts, which are discrete strings of text, soft prompts are continuous vectors (embeddings) that are learned during training. These vectors capture task-specific information and act as cues for the frozen model. Because the core model remains unchanged, the soft prompts can be efficiently tuned for different tasks without the overhead of full model fine-tuning.

How Prompt Tuning Works

  1. Begin with a large pre-trained language model (e.g., T5, GPT) that has been trained on a vast dataset.

  2. The main model parameters are kept fixed. A small set of soft prompt embeddings is added to the input. These are trainable vectors that serve as task-specific instructions.

  3. The soft prompt embeddings are appended to the tokenized representation of the input text. This creates a combined input that provides both the original text and the task cues from the soft prompts.

  4. During training on a specific task, backpropagation updates only the soft prompt parameters. The rest of the model remains unchanged, which minimizes computational and storage requirements.

  5. Once trained, the soft prompts can be stored and later appended to inputs at inference time, allowing the same frozen model to be used across multiple tasks by simply switching the soft prompt.

How It Works in Practice

To understand the basic logic behind soft prompting, let's think about how model inference works on a given prompt:

Astronaut

Prompt


What's 2+2?
  1. It might be tokenized as What, 's, 2, +, 2, ?.
  2. Then, each token will be converted to a vector of values.
  3. These vectors of values can be considered as model parameters. The model can be further trained, only adjusting the weights of these prompts.

Main Benefits of Prompt Tuning

BenefitDescription
Efficient parameter usageRequires 0.01%–0.1% of the parameters compared to fine-tuning.
Scales with model sizePerformance improves as model size increases.
Enables multi-task learningA single model can handle multiple tasks by switching soft prompts.
Better generalizationImproves zero-shot learning and robustness to domain shifts.
Storage & compute savingsNo need to store separate fine-tuned models for each task.

Results

Prompt tuning performs better with larger models. Larger models also require fewer soft prompt tokens. Regardless, more than 20 tokens do not yield significant performance gains.

Conclusion

Prompt tuning is a scalable and efficient alternative to full fine-tuning for adapting large language models. By leveraging soft prompts, it enables multi-task learning, reduces storage and inference costs, and achieves competitive performance with fine-tuningβ€”especially as model size increases.

Footnotes

  1. Lester, B., Al-Rfou, R., & Constant, N. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning. ↩ ↩2

  2. Khashabi, D., Lyu, S., Min, S., Qin, L., Richardson, K., Welleck, S., Hajishirzi, H., Khot, T., Sabharwal, A., Singh, S., & Choi, Y. (2021). Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts. ↩

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.