Announcing our new Course: AI Red-Teaming and AI Safety Masterclass

Check it out →
🔓 Prompt Hacking🟢 Defensive Measures🟢 Introduction

🟢 Introduction

Last updated on August 7, 2024 by Sander Schulhoff

Preventing prompt injection can be extremely difficult, and there exist few robust defenses against it12. However, there are some commonsense solutions. For example, if your application does not need to output free-form text, do not allow such outputs. There are many different ways to defend a prompt. We will discuss some of the most common ones here.

This chapter covers additional commonsense strategies like filtering out words. It also covers prompt improvement strategies (instruction defense, post-prompting, different ways to enclose user input, and XML tagging). Finally, we discuss using an LLM to evaluate output and some more model specific approaches.

Footnotes

  1. Crothers, E., Japkowicz, N., & Viktor, H. (2022). Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods.

  2. Goodside, R. (2022). GPT-3 Prompt Injection Defenses. https://twitter.com/goodside/status/1578278974526222336?s=20&t=3UMZB7ntYhwAk3QLpKMAbw

Edit this page
Word count: 0
Copyright © 2024 Learn Prompting.