Introduction
- Image Prompting Techniques: This chapter outlines how to create consistent, high-quality images with AI models like DALLE and Stable Diffusion, focusing on effective image prompting techniques.
Figuring out the best prompt to create a perfect image is a particular challenge. Research into methods to do so is not quite as developed as text prompting. This may be due to inherent challenges in creating objects that are fundamentally subjective and often lack good accuracy metrics. However, fear not, as the image prompting community has made great discoveries about how to prompt various image models.
This guide covers basic image-prompting techniques, and we highly encourage that you look at the great resources at the end of the chapter. Additionally, we provide an example of the end-to-end image prompting process below.
An Example of Image Prompting
Here I will go through an example of how I created the images for the front page of this course. I had been experimenting with a low poly style for a deep reinforcement learning neural radiance field project. I liked the low poly style and wanted to use it for this course's images.
I wanted an astronaut, a rocket, and a computer for the images on the front page.
I did a bunch of research into how to create low poly images, on r/StableDiffusion and other sites, but couldn't find anything super helpful.
I decided to just start with DALLE and the following prompt, and see what happened.
Prompt
I thought these results were pretty decent for a first try; I particularly liked the bottom left rocket.
Next, I wanted a computer in the same style:
Prompt
Finally, I needed an astronaut! This prompt seemed to do the trick:
Prompt
I thought the second one was decent.
Now I had an astronaut, a rocket, and a computer. I was happy with them, so I put them on the front page. After a few days and input from my friends I realized the style just wasn't consistent π.
I did some more research on r/StableDiffusion and found people using the word isometric. I decided to try that out, using Stable Diffusion instead of DALLE. I also realized that I needed to add more modifiers to my prompt to constrain the style. I tried this prompt:
Prompt
These weren't great, so I decided to start on the rocket instead
Prompt
These are not particularly good, but after a bit of iterating around here, I ended up with
Now I needed a better laptop:
Prompt
I got some inconsistent results; I like the bottom right one, but I decided to go in a different direction.
Prompt
This wasn't quite right. Let's try something magical and glowing.
Prompt
I liked these, but wanted the stone in the middle of the screen.
Prompt
Somewhere around here, I used SD's ability to have a previous image provide some influence for future images. Thus I arrived at:
Finally, I was on to the astronaut.
Prompt
At this point, I was sufficiently happy with the style consistency between my three images to use them on the website. The main takeaways for me were that this was a very iterative, research-heavy process, and I had to modify my expectations and ideas as I experimented with different prompts and models.
Sander Schulhoff
Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.
π’ Fix Deformed Generations
π’ Midjourney
π’ Quality Boosters
π’ Repetition
π’ Resources
π’ Shot type
π’ Style Modifiers
π’ Weighted Terms
Footnotes
-
Parsons, G. (2022). The DALLE 2 Prompt Book. https://dallery.gallery/the-dalle-2-prompt-book/ β©
-
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2021). High-Resolution Image Synthesis with Latent Diffusion Models. β©
-
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. β©