Prompt Engineering Guide
πŸ˜ƒ Basics
πŸ’Ό Applications
πŸ§™β€β™‚οΈ Intermediate
🧠 Advanced
Special Topics
🌱 New Techniques
πŸ€– Agents
βš–οΈ Reliability
πŸ–ΌοΈ Image Prompting
πŸ”“ Prompt Hacking
πŸ”¨ Tooling
πŸ’ͺ Prompt Tuning
πŸ—‚οΈ RAG
🎲 Miscellaneous
Models
πŸ”§ Models
Resources
πŸ“™ Vocabulary Resource
πŸ“š Bibliography
πŸ“¦ Prompted Products
πŸ›Έ Additional Resources
πŸ”₯ Hot Topics
✨ Credits
🌱 New TechniquesπŸ‘€ For Vision-Language Models (VLMs)🟒 Prompt Learning

🟒 Prompt Learning for Vision-Language Models

🟒 This article is rated easy
Reading Time: 1 minute
Last updated on October 1, 2024

Valeriia Kuka

What is Learning to Prompt for Vision-Language Models?

In vision-language models like CLIP, learning to prompt or prompt learning is a method for improving how models handle visual recognition tasks by optimizing how they are "prompted" to process images and text. In other words, it's prompt engineering tailored to vision-language models. Typically, vision-language models align images and texts in a shared feature space, allowing the models to classify new images by comparing them with text descriptions, rather than relying on pre-defined categories.

A major challenge with these models is prompt engineering, which involves finding the right words to describe image classes. This process can be time-consuming and requires expertise because small changes in wording can significantly affect performance.

Valeriia Kuka

Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.

Footnotes

  1. Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to Prompt for Vision-Language Models. International Journal of Computer Vision, 130(9), 2337–2348. https://doi.org/10.1007/s11263-022-01653-1 ↩