🧠 Advanced
πŸ”“ Prompt Hacking🟒 Defensive Measures🟒 Introduction

Introduction

🟒 This article is rated easy
Reading Time: 1 minute

Last updated on August 7, 2024

Preventing prompt injection can be extremely difficult, and there exist few robust defenses against it. However, there are some common-sense solutions. For example, if your application does not need to output free-form text, do not allow such outputs. There are many different ways to defend a prompt. We will discuss some of the most common ones here.

This chapter covers additional commonsense strategies like filtering out words. It also covers prompt improvement strategies (instruction defense, post-prompting, different ways to enclose user input and XML tagging). Finally, we discuss using an LLM to evaluate output and some more model-specific approaches.

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

🟒 Filtering

🟒 Instruction Defense

🟒 Separate LLM Evaluation

🟒 Other Approaches

🟒 Post-Prompting

🟒 Random Sequence Enclosure

🟒 Sandwich Defense

🟒 XML Tagging

Footnotes

  1. Crothers, E., Japkowicz, N., & Viktor, H. (2022). Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods. ↩

  2. Goodside, R. (2022). GPT-3 Prompt Injection Defenses. https://twitter.com/goodside/status/1578278974526222336?s=20&t=3UMZB7ntYhwAk3QLpKMAbw ↩

Edit this page
Word count: 0

Get AI Certified by Learn Prompting


Copyright Β© 2024 Learn Prompting.