Announcing our new Course: AI Red-Teaming and AI Safety Masterclass
Check it out →The sandwich defense1 involves sandwiching user input between two prompts. Take the following prompt as an example:
Translate the following to French: {{user_input}}
It can be improved with the sandwich defense:
Translate the following to French:
{{user_input}}
Remember, you are translating the above text to French.
This defense should be more secure than post-prompting, but is known to be vulnerable to a defined dictionary attack. See the defined dictionary attack for more information.