Announcing our new Course: AI Red-Teaming and AI Safety Masterclass

Check it out →
🔓 Prompt Hacking🟢 Defensive Measures🟢 Filtering

🟢 Filtering

Last updated on August 7, 2024 by Sander Schulhoff

Filtering is a common technique for preventing prompt hacking1. There are a few types of filtering, but the basic idea is to check for words and phrase in the initial prompt or the output that should be blocked. You can use a blocklist or an allowlist for this purpose2. A blocklist is a list of words and phrases that should be blocked, and an allowlist is a list of words and phrases that should be allowed.

Footnotes

  1. Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2023). Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks.

  2. Selvi, J. (2022). Exploring Prompt Injection Attacks. https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/

Edit this page
Word count: 0
Copyright © 2024 Learn Prompting.