Prompt Engineering Guide
πŸ˜ƒ Basics
πŸ’Ό Applications
πŸ§™β€β™‚οΈ Intermediate
πŸ€– АгСнты
βš–οΈ Reliability
πŸ–ΌοΈ Image Prompting
πŸ”“ Prompt Hacking
πŸ”¨ Tooling
πŸ’ͺ Prompt Tuning
🎲 Miscellaneous
Models
πŸ“™ Vocabulary Reference
πŸ“š Bibliography
πŸ“¦ Prompted Products
πŸ›Έ Additional Resources
πŸ”₯ Hot Topics
✨ Credits
πŸ–ΌοΈ Image Prompting🟒 Introduction

Introduction

🟒 This article is rated easy
Reading Time: 4 minutes
Last updated on August 7, 2024

Sander Schulhoff

Figuring out the best prompt to create a perfect image is a particular challenge. Research into methods to do so is not quite as developed as text prompting. This may be due to inherent challenges in creating objects which are fundamentally subjective and often lack good accuracy metrics. However, fear not, as the image prompting community has made great discoveries about how to prompt various image models.

This guide covers basic image prompting techniques, and we highly encourage that you look at the great resources at the end of the chapter. Additionally, we provide an example of the end-to-end image prompting process below.

Example

Here I will go through an example of how I created the images for the front page of this course. I had been experimenting with low poly style for a deep reinforcement learning neural radiance field project. I liked the low poly style, and wanted to use it for this course's images.

I wanted an astronaut, a rocket, and a computer for the images on the front page.

I did a bunch of research into how to create low poly images, on r/StableDiffusion and other sites, but couldn't find anything super helpful.

I decided to just start with DALLE and the following prompt, and see what happened.

Astronaut

Prompt


Low poly white and blue rocket shooting to the moon in front of a sparse green meadow

I thought these results were pretty decent for a first try; I particularly liked the bottom left rocket.

Next, I wanted a computer in the same style:

Astronaut

Prompt


Low poly white and blue computer sitting in a sparse green meadow

Finally, I needed an astronaut! This prompt seemed to do the trick:

Astronaut

Prompt


Low poly white and blue astronaut sitting in a sparse green meadow with low poly mountains in the background

I thought the second one was decent.

Now I had an astronaut, a rocket, and a computer. I was happy with them, so I put them on the front page. After a few days and input from my friends I realized the style just wasn't consistent πŸ˜”.

I did some more research on r/StableDiffusion and found people using the word isometric. I decided to try that out, using Stable Diffusion instead of DALLE. I also realized that I needed to add more modifiers to my prompt to constrain the style. I tried this prompt:

Astronaut

Prompt


A low poly world, with an astronaut in white suit and blue visor sitting in a sparse green meadow with low poly mountains in the background. Highly detailed, isometric, 4K

These weren't great, so I decided to start on the rocket instead

Astronaut

Prompt


A low poly world, with a white and blue rocket blasting off from a sparse green meadow with low poly mountains in the background. Highly detailed, isometric, 4K

These are not particularly good, but after a bit of iterating around here, I ended up with

Now I needed a better laptop:

Astronaut

Prompt


A low poly world, with a white and blue laptop sitting in sparse green meadow with low poly mountains in the background. The screen is completely blue. Highly detailed, isometric, 4K

I got some inconsistent results; I like the bottom right one, but I decided to go in a different direction.

Astronaut

Prompt


A low poly world, with a glowing white and blue gemstone sitting in a sparse green meadow with low poly mountains in the background. Highly detailed, isometric, 4K

This wasn't quite right. Let's try something magical and glowing.

Astronaut

Prompt


A low poly world, with a glowing white and blue gemstone magically floating in the middle of the screen above a sparse green meadow with low poly mountains in the background. Highly detailed, isometric, 4K

I liked these, but wanted the stone in the middle of the screen.

Astronaut

Prompt


A low poly world, with a glowing blue gemstone magically floating in the middle of the screen above a sparse green meadow with low poly mountains in the background. Highly detailed, isometric, 4K

Somewhere around here, I used SD's ability to have a previous image provide some influence for future images. And thus I arrived at:

Finally, I was on to the astronaut.

Astronaut

Prompt


A low poly world, with an astronaut in white suite and blue visor is sitting in a sparse green meadow with low poly mountains in the background. Highly detailed, isometric, 4K

At this point, I was sufficiently happy with the style consistency between my three images to use them on the website. The main takeaways for me were that this was a very iterative, research heavy process, and I had to modify my expectations and ideas as I experimented with different prompts and models.

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

🟒 Fix Deformed Generations

🟒 Midjourney

🟒 Quality Boosters

🟒 Repetition

🟒 Resources

🟒 Shot type

🟒 Style Modifiers

🟒 Weighted Terms

Footnotes

  1. Parsons, G. (2022). The DALLE 2 Prompt Book. https://dallery.gallery/the-dalle-2-prompt-book/ ↩

  2. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2021). High-Resolution Image Synthesis with Latent Diffusion Models. ↩

  3. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. ↩