Last updated on August 7, 2024
The post-prompting defense simply puts the user input before the prompt. Take this prompt as an example:
Translate the following to French: {{user_input}}
It can be improved with post-prompting:
{{user_input}}
Translate the above text to French.
This can help since ignore the above instruction...
doesn't work as well. Even though a user could say ignore the below instruction...
instead, LLMs often will follow the last instruction they see.
Mark, C. (2022). Talking to machines: prompt engineering & injection. https://artifact-research.com/artificial-intelligence/talking-to-machines-prompt-engineering-injection/ β©