🔓 破解提示🟢 防御措施🟢 Filtering

🟢 Filtering

Last updated on August 7, 2024 by Sander Schulhoff

Filtering is a common technique for preventing prompt hacking. There are a few types of filtering, but the basic idea is to check for words and phrase in the initial prompt or the output that should be blocked. You can use a blocklist or an allowlist for this purpose. A blocklist is a list of words and phrases that should be blocked, and an allowlist is a list of words and phrases that should be allowed.

Footnotes

  1. Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2023). Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks.

  2. Selvi, J. (2022). Exploring Prompt Injection Attacks. https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/

Edit this page
Word count: 0

Get AI Certified by Learn Prompting


Copyright © 2024 Learn Prompting.